mirror of
https://github.com/cisagov/manage.get.gov.git
synced 2025-08-16 06:24:12 +02:00
Update data_migration.md
This commit is contained in:
parent
f04e3a8339
commit
63727b83fb
1 changed files with 34 additions and 23 deletions
|
@ -11,7 +11,7 @@ because we have a different data model and workflow model. Instead, we should
|
||||||
focus our migration efforts on creating a state in our new registrar that will
|
focus our migration efforts on creating a state in our new registrar that will
|
||||||
primarily allow users of the system to perform the tasks that they want to do.
|
primarily allow users of the system to perform the tasks that they want to do.
|
||||||
|
|
||||||
## Users
|
#### Users
|
||||||
|
|
||||||
One of the major differences with the existing registrar/registry is that our
|
One of the major differences with the existing registrar/registry is that our
|
||||||
system uses Login.gov for authentication. Any person with an identity-verified
|
system uses Login.gov for authentication. Any person with an identity-verified
|
||||||
|
@ -21,7 +21,7 @@ account in our user table. Because we cannot know the Universal Unique ID (UUID)
|
||||||
for a person's Login.gov account, we cannot pre-create user accounts for
|
for a person's Login.gov account, we cannot pre-create user accounts for
|
||||||
individuals in our new registrar based on the original data.
|
individuals in our new registrar based on the original data.
|
||||||
|
|
||||||
## Domains
|
#### Domains
|
||||||
|
|
||||||
Our registrar keeps track of domains. The authoritative source for domain
|
Our registrar keeps track of domains. The authoritative source for domain
|
||||||
information is the registry, but the registrar needs a copy of that
|
information is the registry, but the registrar needs a copy of that
|
||||||
|
@ -42,7 +42,7 @@ locally for testing, using Docker Compose:
|
||||||
docker compose run -T app ./manage.py load_domains_data < /tmp/escrow_domains.daily.dotgov.GOV.txt
|
docker compose run -T app ./manage.py load_domains_data < /tmp/escrow_domains.daily.dotgov.GOV.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## User access to domains
|
#### User access to domains
|
||||||
|
|
||||||
The data export contains a `escrow_domain_contacts.daily.dotgov.txt` file
|
The data export contains a `escrow_domain_contacts.daily.dotgov.txt` file
|
||||||
that links each domain to three different types of contacts: `billing`,
|
that links each domain to three different types of contacts: `billing`,
|
||||||
|
@ -78,9 +78,9 @@ An example script using this technique is in
|
||||||
docker compose run app ./manage.py load_domain_invitations /app/escrow_domain_contacts.daily.dotgov.GOV.txt /app/escrow_contacts.daily.dotgov.GOV.txt
|
docker compose run app ./manage.py load_domain_invitations /app/escrow_domain_contacts.daily.dotgov.GOV.txt /app/escrow_contacts.daily.dotgov.GOV.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## Transition Domains (Part 1) - Setup Files for Import
|
## Transition Domains (Part 1) - Set Up Files for Import
|
||||||
|
|
||||||
#### STEP 1: Obtain data files
|
#### Step 1: Obtain data files
|
||||||
We are provided with information about Transition Domains in the following files:
|
We are provided with information about Transition Domains in the following files:
|
||||||
| | Filename | Description |
|
| | Filename | Description |
|
||||||
|:-| :-------------------------------------------- | :---------- |
|
|:-| :-------------------------------------------- | :---------- |
|
||||||
|
@ -95,7 +95,7 @@ We are provided with information about Transition Domains in the following files
|
||||||
|9| **agency.adhoc.dotgov.txt** | Has federal agency data
|
|9| **agency.adhoc.dotgov.txt** | Has federal agency data
|
||||||
|10| **migrationFilepaths.json** | A JSON which points towards all given filenames. Specified below.
|
|10| **migrationFilepaths.json** | A JSON which points towards all given filenames. Specified below.
|
||||||
|
|
||||||
#### STEP 2: Obtain JSON file (for file locations)
|
#### Step 2: Obtain JSON file (for file locations)
|
||||||
Add a JSON file called "migrationFilepaths.json" with the following contents (update filenames and directory as needed):
|
Add a JSON file called "migrationFilepaths.json" with the following contents (update filenames and directory as needed):
|
||||||
```
|
```
|
||||||
{
|
{
|
||||||
|
@ -120,21 +120,21 @@ Later on, we will bundle this file along with the others into its own folder. Ke
|
||||||
We need to run a few scripts to parse these files into our domain tables.
|
We need to run a few scripts to parse these files into our domain tables.
|
||||||
We can do this both locally and in a sandbox.
|
We can do this both locally and in a sandbox.
|
||||||
|
|
||||||
#### STEP 3: Bundle all relevant data files into an archive
|
#### Step 3: Bundle all relevant data files into an archive
|
||||||
Move all the files specified in Step 1 into a shared folder, and create a tar.gz.
|
Move all the files specified in Step 1 into a shared folder, and create a tar.gz.
|
||||||
|
|
||||||
Create a folder on your desktop called `datafiles` and move all of the obtained files into that. Add these files to a tar.gz archive using any method. See (here)[https://stackoverflow.com/questions/53283240/how-to-create-tar-file-with-7zip].
|
Create a folder on your desktop called `datafiles` and move all of the obtained files into that. Add these files to a tar.gz archive using any method. See [here](https://stackoverflow.com/questions/53283240/how-to-create-tar-file-with-7zip).
|
||||||
|
|
||||||
After this is created, move this archive into `src/migrationdata`.
|
After this is created, move this archive into `src/migrationdata`.
|
||||||
|
|
||||||
|
|
||||||
### SECTION 1 - SANDBOX MIGRATION SETUP
|
### Set Up Migrations on Sandbox
|
||||||
Load migration data onto a production or sandbox environment
|
Load migration data onto a production or sandbox environment
|
||||||
|
|
||||||
**WARNING:** All files uploaded in this manner are temporary, i.e. they will be deleted when the app is restaged.
|
**WARNING:** All files uploaded in this manner are temporary, i.e. they will be deleted when the app is restaged.
|
||||||
Do not use these environments to store data you want to keep around permanently. We don't want sensitive data to be accidentally present in our application environments.
|
Do not use these environments to store data you want to keep around permanently. We don't want sensitive data to be accidentally present in our application environments.
|
||||||
|
|
||||||
#### STEP 1: Using cat to transfer data to sandboxes
|
#### Step 1: Using cat to transfer data to sandboxes
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cat {LOCAL_PATH_TO_FILE} | cf ssh {APP_NAME_IN_ENVIRONMENT} -c "cat > /home/vcap/tmp/{DESIRED_NAME_OF_FILE}"
|
cat {LOCAL_PATH_TO_FILE} | cf ssh {APP_NAME_IN_ENVIRONMENT} -c "cat > /home/vcap/tmp/{DESIRED_NAME_OF_FILE}"
|
||||||
|
@ -144,17 +144,22 @@ cat {LOCAL_PATH_TO_FILE} | cf ssh {APP_NAME_IN_ENVIRONMENT} -c "cat > /home/vcap
|
||||||
* LOCAL_PATH_TO_FILE - Path to the file you want to copy, ex: src/tmp/escrow_contacts.daily.gov.GOV.txt
|
* LOCAL_PATH_TO_FILE - Path to the file you want to copy, ex: src/tmp/escrow_contacts.daily.gov.GOV.txt
|
||||||
* DESIRED_NAME_OF_FILE - Use this to specify the filename and type, ex: test.txt or escrow_contacts.daily.gov.GOV.txt
|
* DESIRED_NAME_OF_FILE - Use this to specify the filename and type, ex: test.txt or escrow_contacts.daily.gov.GOV.txt
|
||||||
|
|
||||||
**TROUBLESHOOTING:** Depending on your operating system (Windows for instance), this command may upload corrupt data. If you encounter the error `gzip: prfiles.tar.gz: not in gzip format` when trying to unzip a .tar.gz file, use the scp command instead.
|
#### TROUBLESHOOTING STEP 1
|
||||||
|
Depending on your operating system (Windows for instance), this command may upload corrupt data. If you encounter the error `gzip: prfiles.tar.gz: not in gzip format` when trying to unzip a .tar.gz file, use the scp command instead.
|
||||||
#### STEP 1 (Alternative): Using scp to transfer data to sandboxes
|
|
||||||
**IMPORTANT:** Only follow these steps if cat does not work as expected. If it does, skip to step 2.
|
|
||||||
|
|
||||||
|
**IMPORTANT:** Only follow the below troubleshooting steps if cat does not work as expected. If it does, skip to step 2.
|
||||||
|
<details>
|
||||||
|
<summary>Troubleshooting cat instructions
|
||||||
|
</summary>
|
||||||
|
|
||||||
|
#### Use scp to transfer data to sandboxes.
|
||||||
CloudFoundry supports scp as means of transferring data locally to our environment. If you are dealing with a batch of files, try sending across a tar.gz and unpacking that.
|
CloudFoundry supports scp as means of transferring data locally to our environment. If you are dealing with a batch of files, try sending across a tar.gz and unpacking that.
|
||||||
|
|
||||||
##### Login to Cloud.gov
|
##### Login to Cloud.gov
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cf login -a api.fr.cloud.gov --sso
|
cf login -a api.fr.cloud.gov --sso
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Target your workspace
|
##### Target your workspace
|
||||||
|
@ -187,8 +192,10 @@ cf ssh-code
|
||||||
Copy this code into the password prompt from earlier.
|
Copy this code into the password prompt from earlier.
|
||||||
|
|
||||||
NOTE: You can use different utilities to copy this onto the clipboard for you. If you are on Windows, try the command `cf ssh-code | clip`. On Mac, this will be `cf ssh-code | pbcopy`
|
NOTE: You can use different utilities to copy this onto the clipboard for you. If you are on Windows, try the command `cf ssh-code | clip`. On Mac, this will be `cf ssh-code | pbcopy`
|
||||||
|
</details>
|
||||||
|
|
||||||
#### STEP 2: Transfer uploaded files to the getgov directory
|
|
||||||
|
#### Step 2: Transfer uploaded files to the getgov directory
|
||||||
Due to the nature of how Cloud.gov operates, the getgov directory is dynamically generated whenever the app is built under the tmp/ folder. We can directly upload files to the tmp/ folder but cannot target the generated getgov folder directly, as we need to spin up a shell to access this. From here, we can move those uploaded files into the getgov directory using the `cat` command. Note that you will have to repeat this for each file you want to move, so it is better to use a tar.gz for multiple, and unpack it inside of the `datamigration` folder.
|
Due to the nature of how Cloud.gov operates, the getgov directory is dynamically generated whenever the app is built under the tmp/ folder. We can directly upload files to the tmp/ folder but cannot target the generated getgov folder directly, as we need to spin up a shell to access this. From here, we can move those uploaded files into the getgov directory using the `cat` command. Note that you will have to repeat this for each file you want to move, so it is better to use a tar.gz for multiple, and unpack it inside of the `datamigration` folder.
|
||||||
|
|
||||||
##### SSH into your sandbox
|
##### SSH into your sandbox
|
||||||
|
@ -209,10 +216,14 @@ cf ssh {APP_NAME_IN_ENVIRONMENT}
|
||||||
```
|
```
|
||||||
|
|
||||||
This will look for all files in /tmp with that are the same file type as `FILE_EXTENSION_TYPE`.
|
This will look for all files in /tmp with that are the same file type as `FILE_EXTENSION_TYPE`.
|
||||||
**Example 1: txt**
|
|
||||||
|
**Example 1: Transferring txt files**
|
||||||
|
|
||||||
`./manage.py cat_files_into_getgov --file_extension txt` will search for
|
`./manage.py cat_files_into_getgov --file_extension txt` will search for
|
||||||
all files with the .txt extension.
|
all files with the .txt extension.
|
||||||
**Example 2: .tar.gz**
|
|
||||||
|
**Example 2: Transferring tar.gz files**
|
||||||
|
|
||||||
`./manage.py cat_files_into_getgov --file_extension tar.gz` will search
|
`./manage.py cat_files_into_getgov --file_extension tar.gz` will search
|
||||||
for .tar.gz files.
|
for .tar.gz files.
|
||||||
|
|
||||||
|
@ -237,7 +248,7 @@ cat ../tmp/{filename} > migrationdata/{filename}
|
||||||
|
|
||||||
*You are now ready to run migration scripts (see [Running the Migration Scripts](running-the-migration-scripts))*
|
*You are now ready to run migration scripts (see [Running the Migration Scripts](running-the-migration-scripts))*
|
||||||
|
|
||||||
### SECTION 2 - LOCAL MIGRATION SETUP (TESTING PURPOSES ONLY)
|
### Set Up Migrations on Local (TESTING PURPOSES ONLY)
|
||||||
|
|
||||||
***IMPORTANT: only use test data, to avoid publicizing PII in our public repo.***
|
***IMPORTANT: only use test data, to avoid publicizing PII in our public repo.***
|
||||||
|
|
||||||
|
@ -253,7 +264,7 @@ This will allow Docker to mount the files to a container (under `/app`) for our
|
||||||
## Transition Domains (Part 2) - Running the Migration Scripts
|
## Transition Domains (Part 2) - Running the Migration Scripts
|
||||||
While keeping the same ssh instance open (if you are running on a sandbox), run through the following commands. If you cannot run `manage.py` commands, try running `/tmp/lifecycle/shell` in the ssh instance.
|
While keeping the same ssh instance open (if you are running on a sandbox), run through the following commands. If you cannot run `manage.py` commands, try running `/tmp/lifecycle/shell` in the ssh instance.
|
||||||
|
|
||||||
### STEP 1: Load Transition Domains
|
### Step 1: Load Transition Domains
|
||||||
|
|
||||||
Run the following command, making sure the file paths point to the right location of your migration files. This will parse all given files and
|
Run the following command, making sure the file paths point to the right location of your migration files. This will parse all given files and
|
||||||
load the information into the TransitionDomain table. Make sure you have your migrationFilepaths.json file in the same directory.
|
load the information into the TransitionDomain table. Make sure you have your migrationFilepaths.json file in the same directory.
|
||||||
|
@ -315,7 +326,7 @@ Defines the filename for domain type adhocs.
|
||||||
`--infer_filenames`
|
`--infer_filenames`
|
||||||
Determines if we should infer filenames or not. This setting is not available for use in environments with the flag `settings.DEBUG` set to false, as it is intended for local development only.
|
Determines if we should infer filenames or not. This setting is not available for use in environments with the flag `settings.DEBUG` set to false, as it is intended for local development only.
|
||||||
|
|
||||||
### STEP 2: Transfer Transition Domain data into main Domain tables
|
### Step 2: Transfer Transition Domain data into main Domain tables
|
||||||
|
|
||||||
Now that we've loaded all the data into TransitionDomain, we need to update the main Domain and DomainInvitation tables with this information.
|
Now that we've loaded all the data into TransitionDomain, we need to update the main Domain and DomainInvitation tables with this information.
|
||||||
In the same terminal as used in STEP 1, run the command below;
|
In the same terminal as used in STEP 1, run the command below;
|
||||||
|
@ -339,7 +350,7 @@ This will print out additional, detailed logs.
|
||||||
Directs the script to load only the first 100 entries into the table. You can adjust this number as needed for testing purposes.
|
Directs the script to load only the first 100 entries into the table. You can adjust this number as needed for testing purposes.
|
||||||
**Note:** `--limitParse` is currently experiencing issues and may not work as intended.
|
**Note:** `--limitParse` is currently experiencing issues and may not work as intended.
|
||||||
|
|
||||||
### STEP 3: Send Domain invitations
|
### Step 3: Send Domain invitations
|
||||||
|
|
||||||
To send invitation emails for every transition domain in the transition domain table, execute the following command:
|
To send invitation emails for every transition domain in the transition domain table, execute the following command:
|
||||||
|
|
||||||
|
@ -352,7 +363,7 @@ docker compose run -T app ./manage.py send_domain_invitations -s
|
||||||
./manage.py send_domain_invitations -s
|
./manage.py send_domain_invitations -s
|
||||||
```
|
```
|
||||||
|
|
||||||
### STEP 4: Test the results (Run the analyzer script)
|
### Step 4: Test the results (Run the analyzer script)
|
||||||
|
|
||||||
This script's main function is to scan the transition domain and domain tables for any anomalies. It produces a simple report of missing or duplicate data. NOTE: some missing data might be expected depending on the nature of our migrations so use best judgement when evaluating the results.
|
This script's main function is to scan the transition domain and domain tables for any anomalies. It produces a simple report of missing or duplicate data. NOTE: some missing data might be expected depending on the nature of our migrations so use best judgement when evaluating the results.
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue