Remove datastore related code (#1906)

This commit is contained in:
Lai Jiang 2023-01-19 14:44:11 -05:00 committed by GitHub
parent 913edb23ee
commit e41fd7877e
152 changed files with 886 additions and 4460 deletions

View file

@ -120,17 +120,6 @@ specification for the cron task.
Here are the task queues in use by the system. All are push queues unless
explicitly marked as otherwise.
* `async-delete-pull` and `async-host-rename-pull` -- Pull queues for tasks to
asynchronously delete contacts/hosts and to asynchronously refresh DNS for
renamed hosts, respectively. Tasks are enqueued during EPP flows and then
handled in batches by the regularly running cron tasks
`DeleteContactsAndHostsAction` and `RefreshDnsOnHostRenameAction`.
* `bigquery-streaming-metrics` -- Queue for metrics that are asynchronously
streamed to BigQuery in the `Metrics` class. Tasks are enqueued during EPP
flows in `EppController`. This means that there is a lag of a few seconds to
a few minutes between when metrics are generated and when they are queryable
in BigQuery, but this is preferable to slowing all EPP flows down and
blocking them on BigQuery streaming.
* `brda` -- Queue for tasks to upload weekly Bulk Registration Data Access
(BRDA) files to a location where they are available to ICANN. The
`RdeStagingReducer` (part of the RDE MapReduce) creates these tasks at the
@ -141,33 +130,6 @@ explicitly marked as otherwise.
`DnsWriter` for the TLD.
* `dns-publish` -- Queue for batches of DNS updates to be pushed to DNS
writers.
* `export-bigquery-poll` -- Queue for tasks to query the success/failure of a
given BigQuery export job. Tasks are enqueued by `BigqueryPollJobAction`.
* `export-commits` -- Queue for tasks to export commit log checkpoints. Tasks
are enqueued by `CommitLogCheckpointAction` (which is run every minute by
cron) and executed by `ExportCommitLogDiffAction`.
* `export-snapshot` -- Cron and push queue for tasks to load a Datastore
snapshot that was stored in Google Cloud Storage and export it to BigQuery.
Tasks are enqueued by both cron and `CheckSnapshotAction` and are executed
by both `ExportSnapshotAction` and `LoadSnapshotAction`.
* `export-snapshot-poll` -- Queue for tasks to check that a Datastore snapshot
has been successfully uploaded to Google Cloud Storage (this is an
asynchronous background operation that can take an indeterminate amount of
time). Once the snapshot is successfully uploaded, it is imported into
BigQuery. Tasks are enqueued by `ExportSnapshotAction` and executed by
`CheckSnapshotAction`.
* `export-snapshot-update-view` -- Queue for tasks to update the BigQuery
views to point to the most recently uploaded snapshot. Tasks are enqueued by
`LoadSnapshotAction` and executed by `UpdateSnapshotViewAction`.
* `group-members-sync` -- Cron queue for tasks to sync registrar contacts (not
domain contacts!) to Google Groups. Tasks are executed by
`SyncGroupMembersAction`.
* `load[0-9]` -- Queues used to load-test the system by `LoadTestAction`.
These queues don't need to exist except when actively running load tests
(running load tests on production environments is not recommended). There
are ten of these queues to provide simple sharding, because Nomulus is
capable of handling significantly more Queries Per Second than the highest
throttle limit available on task queues (which is 500 qps).
* `lordn-claims` and `lordn-sunrise` -- Pull queues for handling LORDN
exports. Tasks are enqueued synchronously during EPP commands depending on
whether the domain name in question has a claims notice ID.
@ -298,109 +260,19 @@ of experience running a production registry using this codebase.
errors, it can be pushed to Production.
5. Repeat once weekly, or potentially more often.
## Cloud Datastore
## Cloud SQL
Nomulus uses [Cloud
Datastore](https://cloud.google.com/appengine/docs/java/datastore/) as its
primary database. Cloud Datastore is a NoSQL document database that provides
automatic horizontal scaling, high performance, and high availability. All
information that is persisted to Cloud Datastore takes the form of Java classes
annotated with `@Entity` that are located in the `model` package. The [Objectify
library](https://cloud.google.com/appengine/docs/java/gettingstarted/using-datastore-objectify)
is used to persist instances of these classes in a format that Datastore
understands.
A brief overview of the different entity types found in the App Engine Datastore
Viewer may help administrators understand what they are seeing. Note that some
of these entities are part of App Engine tools that are outside of the domain
registry codebase:
* `_AE_*` -- These entities are created by App Engine.
* `_ah_SESSION` -- These entities track App Engine client sessions.
* `_GAE_MR_*` -- These entities are generated by App Engine while running
MapReduces.
* `BackupStatus` -- There should only be one of these entities, used to
maintain the state of the backup process.
* `Cancellation` -- A cancellation is a special type of billing event which
represents the cancellation of another billing event such as a OneTime or
Recurring.
* `ClaimsList`, `ClaimsListShard`, and `ClaimsListSingleton` -- These entities
store the TMCH claims list, for use in trademark processing.
* `CommitLog*` -- These entities store the commit log information.
* `Contact` -- These hold the ICANN contact information (but not
registrar contacts, who have a separate entity type).
* `Cursor` -- We use Cursor entities to maintain state about daily processes,
remembering which dates have been processed. For instance, for the RDE
export, Cursor entities maintain the date up to which each TLD has been
exported.
* `Domain` -- These hold the ICANN domain information.
* `DomainRecord` -- These are used during the DNS update process.
* `EntityGroupRoot` -- There is only one EntityGroupRoot entity, which serves
as the Datastore parent of many other entities.
* `EppResourceIndex` -- These entities allow enumeration of EPP resources
(such as domains, hosts and contacts), which would otherwise be difficult to
do in Datastore.
* `ExceptionReportEntity` -- These entities are generated automatically by
ECatcher, a Google-internal logging and debugging tool. Non-Google users
should not encounter these entries.
* `ForeignKeyContactIndex`, `ForeignKeyDomainIndex`, and
`ForeignKeyHostIndex` -- These act as a unique index on contacts, domains
and hosts, allowing transactional lookup by foreign key.
* `HistoryEntry` -- A HistoryEntry is the record of a command which mutated an
EPP resource. It serves as the parent of BillingEvents and PollMessages.
* `HostRecord` -- These are used during the DNS update process.
* `Host` -- These hold the ICANN host information.
* `Lock` -- Lock entities are used to control access to a shared resource such
as an App Engine queue. Under ordinary circumstances, these locks will be
cleaned up automatically, and should not accumulate.
* `MR-*` -- These entities are generated by the App Engine MapReduce library
in the course of running MapReduces.
* `Modification` -- A Modification is a special type of billing event which
represents the modification of a OneTime billing event.
* `OneTime` -- A OneTime is a billing event which represents a one-time charge
or credit to the client (as opposed to Recurring).
* `pipeline-*` -- These entities are also generated by the App Engine
MapReduce library.
* `PollMessage` -- PollMessages are generated by the system to notify
registrars of asynchronous responses and status changes.
* `PremiumList`, `PremiumListEntry`, and `PremiumListRevision` -- The standard
method for determining which domain names receive premium pricing is to
maintain a static list of premium names. Each PremiumList contains some
number of PremiumListRevisions, each of which in turn contains a
PremiumListEntry for each premium name.
* `RdeRevision` -- These entities are used by the RDE subsystem in the process
of generating files.
* `Recurring` -- A Recurring is a billing event which represents a recurring
charge to the client (as opposed to OneTime).
* `Registrar` -- These hold information about client registrars.
* `RegistrarPoc` -- Registrars have contacts just as domains do. These are
stored in a special RegistrarPoc entity.
* `Registry` -- These hold information about the TLDs supported by the
Registry system.
* `RegistryCursor` -- These entities are the predecessor to the Cursor
entities. We are no longer using them, and will be deleting them soon.
* `ReservedList` -- Each ReservedList entity represents an entire list of
reserved names which cannot be registered. Each TLD can have one or more
attached reserved lists.
* `ServerSecret` -- this is a single entity containing the secret numbers used
for generating tokens such as XSRF tokens.
* `SignedMarkRevocationList` -- The entities together contain the Signed Mark
Data Revocation List file downloaded from the TMCH MarksDB each day. Each
entity contains up to 10,000 rows of the file, so depending on the size of
the file, there will be some handful of entities.
* `TmchCrl` -- This is a single entity containing ICANN's TMCH CA Certificate
Revocation List.
To be filled.
## Cloud Storage buckets
Nomulus uses [Cloud Storage](https://cloud.google.com/storage/) for bulk storage
of large flat files that aren't suitable for Datastore. These files include
backups, RDE exports, Datastore snapshots (for ingestion into BigQuery), and
reports. Each bucket name must be unique across all of Google Cloud Storage, so
we use the common recommended pattern of prefixing all buckets with the name of
the App Engine app (which is itself globally unique). Most of the bucket names
are configurable, but the defaults are as follows, with PROJECT standing in as a
placeholder for the App Engine app name:
of large flat files that aren't suitable for Cloud SQL. These files include
backups, RDE exports, and reports. Each bucket name must be unique across all of
Google Cloud Storage, so we use the common recommended pattern of prefixing all
buckets with the name of the App Engine app (which is itself globally unique).
Most of the bucket names are configurable, but the defaults are as follows, with
PROJECT standing in as a placeholder for the App Engine app name:
* `PROJECT-billing` -- Monthly invoice files for each registrar.
* `PROJECT-commits` -- Daily exports of commit logs that are needed for
@ -421,9 +293,6 @@ placeholder for the App Engine app name:
regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The
bucket must exist.
* `PROJECT-reporting` -- Contains monthly ICANN reporting files.
* `PROJECT-snapshots` -- Contains daily exports of Datastore entities of types
defined in `ExportConstants.java`. These are imported into BigQuery daily to
allow for in-depth querying.
* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By
default, the App Engine MapReduce library places its temporary files in a
bucket named {project}.appspot.com. This bucket must exist. To keep