mirror of
https://github.com/cisagov/manage.get.gov.git
synced 2025-08-14 05:29:43 +02:00
Merge pull request #534 from cisagov/nmb/migration
This commit is contained in:
commit
8a22c6f9a5
3 changed files with 224 additions and 0 deletions
79
docs/operations/data_migration.md
Normal file
79
docs/operations/data_migration.md
Normal file
|
@ -0,0 +1,79 @@
|
||||||
|
# Registrar Data Migration
|
||||||
|
|
||||||
|
There is an existing registrar/registry at Verisign. They will provide us with an
|
||||||
|
export of the data from that system. The goal of our data migration is to take
|
||||||
|
the provided data and use it to create as much as possible a _matching_ state
|
||||||
|
in our registrar.
|
||||||
|
|
||||||
|
There is no way to make our registrar _identical_ to the Verisign system
|
||||||
|
because we have a different data model and workflow model. Instead, we should
|
||||||
|
focus our migration efforts on creating a state in our new registrar that will
|
||||||
|
primarily allow users of the system to perform the tasks that they want to do.
|
||||||
|
|
||||||
|
## Users
|
||||||
|
|
||||||
|
One of the major differences with the existing registrar/registry is that our
|
||||||
|
system uses Login.gov for authentication. Any person with an identity-verified
|
||||||
|
Login.gov account can make an account on the new registrar, and the first time
|
||||||
|
that person logs in through Login.gov, we make a corresponding account in our
|
||||||
|
user table. Because we cannot know the Universal Unique ID (UUID) for a
|
||||||
|
person's Login.gov account, we cannot pre-create user accounts for individuals
|
||||||
|
in our new registrar based on the data from Verisign.
|
||||||
|
|
||||||
|
## Domains
|
||||||
|
|
||||||
|
Our registrar keeps track of domains. The authoritative source for domain
|
||||||
|
information is the registry, but the registrar needs a copy of that
|
||||||
|
information to make connections between registry users and the domains that
|
||||||
|
they manage. The registrar stores very few fields about a domain except for
|
||||||
|
its name, so it could be straightforward to import the exported list of domains
|
||||||
|
from Verisign's `escrow_domains.daily.dotgov.GOV.txt`. It doesn't appear that
|
||||||
|
that table stores a flag for active or inactive, so every domain in the file
|
||||||
|
can be imported into our system as `is_active=True`.
|
||||||
|
|
||||||
|
An example Django management command that can load the delimited text file
|
||||||
|
from the daily escrow is in
|
||||||
|
`src/registrar/management/commands/load_domains_data.py`. It uses Django's
|
||||||
|
object-relational modeler (ORM) to create Django objects for the domains and
|
||||||
|
then write them to the database in a single bulk operation. To run the command
|
||||||
|
locally for testing, using Docker Compose:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
docker compose run -T app ./manage.py load_domains_data < /tmp/escrow_domains.daily.dotgov.GOV.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## User access to domains
|
||||||
|
|
||||||
|
The Verisign data contains a `escrow_domain_contacts.daily.dotgov.txt` file
|
||||||
|
that links each domain to three different types of contacts: `billing`,
|
||||||
|
`tech`, and `admin`. The ID of the contact in this linking table corresponds
|
||||||
|
to the ID of a contact in the `escrow_contacts.daily.dotgov.txt` file. In the
|
||||||
|
contacts file is an email address for each contact.
|
||||||
|
|
||||||
|
The new registrar associates user accounts (authenticated with Login.gov) with
|
||||||
|
domains using a `UserDomainRole` linking table. New users can be granted roles
|
||||||
|
on domains by creating a `DomainInvitation` that links an email address with a
|
||||||
|
domain. When a new user finishes authenticating with Login.gov and their email
|
||||||
|
address matches an invitation, then they are given the appropriate role on the
|
||||||
|
invitation's domain.
|
||||||
|
|
||||||
|
For the purposes of migration, we can prime the invitation system by creating
|
||||||
|
an invitation in the system for each email address listed in the
|
||||||
|
`domain_contacts` file. This means that if a person is currently a user in the
|
||||||
|
Verisign system, and they use the same email address with Login.gov, then they
|
||||||
|
will end up with access to the same domains in the new registrar that they
|
||||||
|
were associated with in the Verisign system.
|
||||||
|
|
||||||
|
A management command that does this needs to process two data files, one for
|
||||||
|
the contact information and one for the domain/contact association, so we
|
||||||
|
can't use stdin the way that we did before. Instead, we can use the fact that
|
||||||
|
Docker Compose mounts the `src/` directory inside of the container at `/app`.
|
||||||
|
Then, data files that are inside of the `src/` directory can be accessed
|
||||||
|
inside the Docker container.
|
||||||
|
|
||||||
|
An example script using this technique is in
|
||||||
|
`src/registrar/management/commands/load_domain_invitations.py`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
docker compose run app ./manage.py load_domain_invitations /app/escrow_domain_contacts.daily.dotgov.GOV.txt /app/escrow_contacts.daily.dotgov.GOV.txt
|
||||||
|
```
|
76
src/registrar/management/commands/load_domain_invitations.py
Normal file
76
src/registrar/management/commands/load_domain_invitations.py
Normal file
|
@ -0,0 +1,76 @@
|
||||||
|
"""Load domain invitations for existing domains and their contacts."""
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
from django.core.management import BaseCommand
|
||||||
|
|
||||||
|
from registrar.models import Domain, DomainInvitation
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class Command(BaseCommand):
|
||||||
|
help = "Load invitations for existing domains and their users."
|
||||||
|
|
||||||
|
def add_arguments(self, parser):
|
||||||
|
"""Add our two filename arguments."""
|
||||||
|
parser.add_argument(
|
||||||
|
"domain_contacts_filename",
|
||||||
|
help="Data file with domain contact information",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"contacts_filename", help="Data file with contact information"
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument("--sep", default="|", help="Delimiter character")
|
||||||
|
|
||||||
|
def handle(self, domain_contacts_filename, contacts_filename, **options):
|
||||||
|
"""Load the data files and create the DomainInvitations."""
|
||||||
|
sep = options.get("sep")
|
||||||
|
|
||||||
|
# We open the domain file first and hold it in memory.
|
||||||
|
# There are three contacts per domain, so there should be at
|
||||||
|
# most 3*N different contacts here.
|
||||||
|
contact_domains = defaultdict(list) # each contact has a list of domains
|
||||||
|
logger.info("Reading domain-contacts data file %s", domain_contacts_filename)
|
||||||
|
with open(domain_contacts_filename, "r") as domain_file:
|
||||||
|
for row in csv.reader(domain_file, delimiter=sep):
|
||||||
|
# fields are just domain, userid, role
|
||||||
|
# lowercase the domain names now
|
||||||
|
contact_domains[row[1]].append(row[0].lower())
|
||||||
|
logger.info("Loaded domains for %d contacts", len(contact_domains))
|
||||||
|
|
||||||
|
# now we have a mapping of user IDs to lists of domains for that user
|
||||||
|
# iterate over the contacts list and for contacts in our mapping,
|
||||||
|
# create the domain invitations for their email address
|
||||||
|
logger.info("Reading contacts data file %s", contacts_filename)
|
||||||
|
to_create = []
|
||||||
|
skipped = 0
|
||||||
|
with open(contacts_filename, "r") as contacts_file:
|
||||||
|
for row in csv.reader(contacts_file, delimiter=sep):
|
||||||
|
# userid is in the first field, email is the seventh
|
||||||
|
userid = row[0]
|
||||||
|
if userid not in contact_domains:
|
||||||
|
# this user has no domains, skip them
|
||||||
|
skipped += 1
|
||||||
|
continue
|
||||||
|
for domain_name in contact_domains[userid]:
|
||||||
|
email_address = row[6]
|
||||||
|
domain = Domain.objects.get(name=domain_name)
|
||||||
|
to_create.append(
|
||||||
|
DomainInvitation(
|
||||||
|
email=email_address.lower(),
|
||||||
|
domain=domain,
|
||||||
|
status=DomainInvitation.INVITED,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
logger.info("Creating %d invitations", len(to_create))
|
||||||
|
DomainInvitation.objects.bulk_create(to_create)
|
||||||
|
logger.info(
|
||||||
|
"Created %d domain invitations, ignored %d contacts",
|
||||||
|
len(to_create),
|
||||||
|
skipped,
|
||||||
|
)
|
69
src/registrar/management/commands/load_domains_data.py
Normal file
69
src/registrar/management/commands/load_domains_data.py
Normal file
|
@ -0,0 +1,69 @@
|
||||||
|
"""Load domains from registry export."""
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from django.core.management.base import BaseCommand
|
||||||
|
|
||||||
|
from registrar.models import Domain
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _domain_dict_reader(file_object, **kwargs):
|
||||||
|
"""A csv DictReader with the correct field names for escrow_domains data.
|
||||||
|
|
||||||
|
All keyword arguments are sent on to the DictReader function call.
|
||||||
|
"""
|
||||||
|
# field names are from escrow_manifests without "f"
|
||||||
|
return csv.DictReader(
|
||||||
|
file_object,
|
||||||
|
fieldnames=[
|
||||||
|
"Name",
|
||||||
|
"Roid",
|
||||||
|
"IdnTableId",
|
||||||
|
"Registrant",
|
||||||
|
"ClID",
|
||||||
|
"CrRr",
|
||||||
|
"CrID",
|
||||||
|
"CrDate",
|
||||||
|
"UpRr",
|
||||||
|
"UpID",
|
||||||
|
"UpDate",
|
||||||
|
"ExDate",
|
||||||
|
"TrDate",
|
||||||
|
],
|
||||||
|
**kwargs,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class Command(BaseCommand):
|
||||||
|
help = "Load domain data from a delimited text file on stdin."
|
||||||
|
|
||||||
|
def add_arguments(self, parser):
|
||||||
|
parser.add_argument(
|
||||||
|
"--sep", default="|", help="Separator character for data file"
|
||||||
|
)
|
||||||
|
|
||||||
|
def handle(self, *args, **options):
|
||||||
|
separator_character = options.get("sep")
|
||||||
|
reader = _domain_dict_reader(sys.stdin, delimiter=separator_character)
|
||||||
|
# accumulate model objects so we can `bulk_create` them all at once.
|
||||||
|
domains = []
|
||||||
|
for row in reader:
|
||||||
|
name = row["Name"].lower() # we typically use lowercase domains
|
||||||
|
|
||||||
|
# Ensure that there is a `Domain` object for each domain name in
|
||||||
|
# this file and that it is active. There is a uniqueness
|
||||||
|
# constraint for active Domain objects, so we are going to account
|
||||||
|
# for that here with this check so that our later bulk_create
|
||||||
|
# should succeed
|
||||||
|
if Domain.objects.filter(name=name, is_active=True).exists():
|
||||||
|
# don't do anything, this domain is here and active
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
domains.append(Domain(name=name, is_active=True))
|
||||||
|
logger.info("Creating %d new domains", len(domains))
|
||||||
|
Domain.objects.bulk_create(domains)
|
Loading…
Add table
Add a link
Reference in a new issue