Delete everything related to RDE import

This code was never finished or fully working anyway.  It would require
substantial reworking for the Registry 3.0 migration because it's closely tied
to the Datastore model and App Engine MapReduce framework, both of which will be
going away.  We can bring back some of these deleted test files as necessary
if/when we rewrite RDE import for the new schema.

On the plus side, in a relational database, RDE import will be much simpler.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=231265578
This commit is contained in:
mcilwain 2019-01-28 12:08:47 -08:00 committed by Ben McIlwain
parent f0c677b18b
commit 5dedc1e889
90 changed files with 0 additions and 15750 deletions

View file

@ -1,74 +0,0 @@
# Registry Data Import Architecture
*See also the [RDE usage guide](./rde-import-usage.md).*
The Registry Data Import feature was designed to handle escrow files from other
registries with millions of domains. In the spirit of divide and conquer, the
mapreduce library is used to break up the work of the import into smaller chunks
that can be processed in a reasonable period of time. This process is broken
down into four separate mapreduce jobs that must be run in sequence due to how
datastore transactions work and due to dependencies between registry objects.
The steps are broken up as follows:
__Initial Setup__ - This is a set of manual steps that must be completed before
the process can be run, and is out of the scope of this document. See
[Usage](./rde-import-usage.md) for more details.
__Contacts Import__ - Reads contact entries from an escrow file and saves them
as `ContactResource` entities. `HistoryEntry` entities are also created for the
contact. This process depends on initial setup, but does not depend on any
previous step being run.
__Hosts Import__ - Reads host entries from an escrow file and saves them as
`HostResource` entities. `HistoryEntry` entities are also created for the hosts.
This process depends on initial setup, but does not depend on any previous step
being run.
__Domains Import__ - Reads domain entries from an escrow file and saves them as
`DomainResource` entities. For each domain imported, a history entry, autorenew
billing event and autorenew poll message will also be created. For domains that
are in pending transfer state, the import process will also create future
entities for automatic server approval in the same fashion as domain transfer
request EPP messages. Domains cannot be imported until the contacts and hosts
required by the domain are imported in previous steps.
__Hosts Link__ - Reads host entries from an escrow file and links in-zone hosts
to their superordinate domains. This is the last step because both hosts and
domains have to be imported before the link can be made in both directions.
## Components
Each mapreduce job (with the exception of Hosts Link) is made up of a similar
set of components. Note that much of the work that is done by each job strongly
resembles the inversion of the Registry Data Export feature, and reuses the Jaxb
representations of the xml elements that compose escrow files.
__Parser__ - The parser is the lowest level of the import process. This
component parses an escrow file (provided as an open stream from Google Cloud
Storage) into discrete JAXB objects. The parser maintains an internal cursor in
the xml file that represents the next element to be read, and can advance to and
skip any number of elements.
__Reader__ - The reader is configured by each mapreduce job to load an escrow
file from Cloud Storage and use a parser to read a selected subset of the file,
forwarding the results to a mapper.
__Input__ - An input is responsible for determining how many reader instances to
create and which section of the escrow file should be consumed by each reader.
__Converter__ - The converter accepts a Jaxb object and returns an equivalent
resource that can be saved to the datastore.
__Import Utility Logic__ - Common import logic is consolidated into a single
place, such as creation of index entities and escrow file validation.
__Mapper__ - The mapper accepts a stream of Jaxb objects from the reader and
uses the converter to map them to resource objects. Then for each resource
object produced, the mapper will attempt to save the resource and any related
objects to the datastore in a transaction. This is an idempotent operation; if
any resource has been previously imported by the process, it will be ignored.
__Action Endpoint__ - The action endpoint is responsible for accepting requests
to launch each step of the import process, bootstrapping mapreduce jobs, and
redirecting the client to the status page of the import job. This is the entry
point of the import process.

View file

@ -1,122 +0,0 @@
# Registry Data Import Usage Guide
In order to import an RDE escrow file into Nomulus, four different mapreduce
processes must be run in sequence. Future iterations of this feature may
automate the sequence, but for now each step of the process must be run
manually.
Note that only contacts, hosts and domains are imported by this process. Other
objects such as registrars and top level domains are not imported automatically,
and must be configured before running this import process (see below).
## Prerequisites
This document assumes that the import process is being executed against a fully
functioning instance of Nomulus that is set up with its own unique project
id. See the other docs (particularly the [install guide](./install.md)) for
details and helpful links.
Before launching any of the import jobs, the top level domain (TLD) and all
registrars that are included in the escrow file must be created in the registry;
these cannot be created automatically. Detailed instructions on how to create
TLDs, registrars and registrar contacts can be found in
the [first steps tutorial](./first-steps-tutorial.md).
## How to load an escrow file
First of all, ensure that all of the cloud storage buckets are set up for
nomulus. See the [Architecture documentation](./architecture.md) for details.
The escrow file that will be imported should be uploaded to the
`PROJECT-rde-import` cloud storage bucket. The escrow file should not be
compressed or encrypted. When launching each mapreduce job, reference the
absolute path to the file (just the path, not the bucket name) in the `path`
argument.
__TODO:__ Add `PROJECT-rde-import` bucket requirement to architecture doc.
## Overview of import steps
Due to the huge variety in size of domain registries, the number of objects
included in escrow files can also vary widely. In order to process these files
in a way that scales, the import of contacts, hosts and domains has been
implemented as a series of four mapreduce jobs. The four jobs are summarized as
follows:
* Contacts Import - Creates contacts in the registry.
* Hosts Import - Creates host objects in the registry. Note that only host
objects are supported at this time; escrow files with host attributes are not
supported and will be rejected by the import process.
* Domains Import - Creates domain objects in the registry. This step will also
publish NS records and A records to dns as necessary based on referenced host
objects.
* Hosts Link - Links hosts to their superordinate domain objects. This step is
necessary to establish data integrity for imported objects.
The import steps __must__ be run in this order: Contacts Import, Hosts Import,
Domains Import, Hosts Link.
## Executing the import process
The import process steps must be executed by a user that is logged in as an
administrator of the deployed Nomulus instance. Currently, the way to launch the
process is to manually enter the proper url into the user's web browser, which
will kick off the mapreduce job and load its status page. The status page will
serve as a way to monitor the progress of a job until completion. Once each job
is completed, the user can launch the next step in the same fashion.
Parameters:
* path - This is the path to the escrow file in cloud storage.
* mapShards - This is the number of shards that will be used to process the
file. The process has been tested with a mapShards setting of 100, which
seems to perform well in most cases.
The import process is deployed in the `backend` service, so "backend-dot-" will
be prepended to the hostname. Replace `PROJECT` below with the unique project
name under which the Nomulus instance is deployed, and `PATH` with the path to
the escrow file.
Launch Contacts Import:
`https://backend-dot-PROJECT.appspot.com/_dr/importRdeContacts?path=PATH&mapShards=100`
Launch Hosts Import:
`https://backend-dot-PROJECT.appspot.com/_dr/importRdeHosts?path=PATH&mapShards=100`
Launch Domains Import:
`https://backend-dot-PROJECT.appspot.com/_dr/importRdeDomains?path=PATH&mapShards=100`
Launch Hosts Link:
`https://backend-dot-PROJECT.appspot.com/_dr/linkRdeHosts?path=PATH&mapShards=100`
For each job, the mapreduce user interface will display the status of the
running job. The job is finished when all of the boxes on the left turn green
(at which point the next step of the process can be safely launched). If any of
the boxes turn red, it means that the job failed and the logs should be
consulted for errors (see below).
Note that each of these steps is idempotent, and can be run many times with no
harmful side effects or duplicated data.
## Monitoring and troubleshooting
On the status page, several counters will be shown as the job progresses,
indicating the number of operations attempted, how many succeeded, how many were
ignored, and if there were any errors. This is a good tool to understand at a
high level how the job is progressing. The counters from a completed job can
also be compared against the known count of resources from an escrow file to
determine if the import was successful or not.
For a more detailed view, open up the logging for the project in
the [Google Cloud Console](https://console.cloud.google.com). The application
logs from the import process will show up under requests with a url that starts
with `/_dr/mapreduce/workerCallback` - the logs provide a detailed view of which
resources were read from file, which were imported, which had already existed
before the import, and which failed to import.
## Known limitations
Currently, the ID of all registrars in the escrow file must match those that are
already configured in the registry. Future work is planned to map between
different internal and external registrar IDs by using the IANA IDs, which are
always consistent between registries.