mirror of
https://github.com/google/nomulus.git
synced 2025-04-30 03:57:51 +02:00
Add more documentation on cron, Datastore, and Cloud Storage
Note that a lot of this is adapted from existing non-Markdown documentation written by Brian. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=129252200
This commit is contained in:
parent
4cf5a7d67b
commit
6fc7eb40c6
1 changed files with 162 additions and 3 deletions
|
@ -230,7 +230,8 @@ of experience running a production registry using this codebase.
|
|||
All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are
|
||||
specified in `cron.xml` files, with one per environment. There are more tasks
|
||||
that execute in Production than in other environments, because tasks like
|
||||
uploading RDE dumps are only done for the live system.
|
||||
uploading RDE dumps are only done for the live system. Cron tasks execute on
|
||||
the `backend` service.
|
||||
|
||||
Most cron tasks use the `TldFanoutAction` which is accessed via the
|
||||
`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
|
||||
|
@ -245,12 +246,170 @@ The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
|
|||
separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
|
||||
have a single cron entry that will create tasks for all TLDs than to have to
|
||||
specify a separate cron task for each action for each TLD (though that is still
|
||||
an option).
|
||||
an option). Task queues also provide retry semantics in the event of transient
|
||||
failures that a raw cron task does not. This is why there are some tasks that
|
||||
do not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
|
||||
tasks retry in the face of transient errors.
|
||||
|
||||
## Datastore entities
|
||||
The full list of URL parameters to `TldFanoutAction` that can be specified in
|
||||
cron.xml is:
|
||||
* `endpoint` -- The path of the action that should be executed (see `web.xml`).
|
||||
* `queue` -- The cron queue to enqueue tasks in.
|
||||
* `forEachRealTld` -- Specifies that the task should be run in each TLD of type
|
||||
`REAL`. This can be combined with `forEachTestTld`.
|
||||
* `forEachTestTld` -- Specifies that the task should be run in each TLD of type
|
||||
`TEST`. This can be combined with `forEachRealTld`.
|
||||
* `runInEmpty` -- Specifies that the task should be run globally, i.e. just
|
||||
once, rather than individually per TLD. This is provided to allow tasks to
|
||||
retry. It is called "`runInEmpty`" for historical reasons.
|
||||
* `excludes` -- A list of TLDs to exclude from processing.
|
||||
* `jitterSeconds` -- The execution of each per-TLD task is delayed by a
|
||||
different random number of seconds between zero and this max value.
|
||||
|
||||
## Cloud Datastore
|
||||
|
||||
The Domain Registry platform uses
|
||||
[Cloud Datastore](https://cloud.google.com/appengine/docs/java/datastore/) as
|
||||
its primary database. Cloud Datastore is a NoSQL document database that
|
||||
provides automatic horizontal scaling, high performance, and high availability.
|
||||
All information that is persisted to Cloud Datastore takes the form of Java
|
||||
classes annotated with `@Entity` that are located in the `model` package. The
|
||||
[Objectify library](https://cloud.google.com/appengine/docs/java/gettingstarted/using-datastore-objectify)
|
||||
is used to persist instances of these classes in a format that Datastore
|
||||
understands.
|
||||
|
||||
A brief overview of the different entity types found in the App Engine Datastore
|
||||
Viewer may help administrators understand what they are seeing. Note that some
|
||||
of these entities are part of App Engine tools that are outside of the domain
|
||||
registry codebase:
|
||||
|
||||
* `\_AE\_*` -- These entities are created by App Engine.
|
||||
* `\_ah\_SESSION` -- These entities track App Engine client sessions.
|
||||
* `\_GAE\_MR\_*` -- These entities are generated by App Engine while running
|
||||
MapReduces.
|
||||
* `BackupStatus` -- There should only be one of these entities, used to maintain
|
||||
the state of the backup process.
|
||||
* `Cancellation` -- A cancellation is a special type of billing event which
|
||||
represents the cancellation of another billing event such as a OneTime or
|
||||
Recurring.
|
||||
* `ClaimsList`, `ClaimsListShard`, and `ClaimsListSingleton` -- These entities
|
||||
store the TMCH claims list, for use in trademark processing.
|
||||
* `CommitLog*` -- These entities store the commit log information.
|
||||
* `ContactResource` -- These hold the ICANN contact information (but not
|
||||
registrar contacts, who have a separate entity type).
|
||||
* `Cursor` -- We use Cursor entities to maintain state about daily processes,
|
||||
remembering which dates have been processed. For instance, for the RDE export,
|
||||
Cursor entities maintain the date up to which each TLD has been exported.
|
||||
* `DomainApplicationIndex` -- These hold domain applications received during the
|
||||
sunrise period.
|
||||
* `DomainBase` -- These hold the ICANN domain information.
|
||||
* `DomainRecord` -- These are used during the DNS update process.
|
||||
* `EntityGroupRoot` -- There is only one EntityGroupRoot entity, which serves as
|
||||
the Datastore parent of many other entities.
|
||||
* `EppResourceIndex` -- These entities allow enumeration of EPP resources (such
|
||||
as domains, hosts and contacts), which would otherwise be difficult to do in
|
||||
Datastore.
|
||||
* `ExceptionReportEntity` -- These entities are generated automatically by
|
||||
ECatcher, a Google-internal logging and debugging tool. Non-Google users
|
||||
should not encounter these entries.
|
||||
* `ForeignKeyContactIndex`, `ForeignKeyDomainIndex`, and `ForeignKeyHostIndex`
|
||||
-- These act as a unique index on contacts, domains and hosts, allowing
|
||||
transactional lookup by foreign key.
|
||||
* `HistoryEntry` -- A HistoryEntry is the record of a command which mutated an
|
||||
EPP resource. It serves as the parent of BillingEvents and PollMessages.
|
||||
* `HostRecord` -- These are used during the DNS update process.
|
||||
* `HostResource` -- These hold the ICANN host information.
|
||||
* `Lock` -- Lock entities are used to control access to a shared resource such
|
||||
as an App Engine queue. Under ordinary circumstances, these locks will be
|
||||
cleaned up automatically, and should not accumulate.
|
||||
* `LogsExportCursor` -- This is a single entity which maintains the state of log
|
||||
export.
|
||||
* `MR-*` -- These entities are generated by the App Engine MapReduce library in
|
||||
the course of running MapReduces.
|
||||
* `Modification` -- A Modification is a special type of billing event which
|
||||
represents the modification of a OneTime billing event.
|
||||
* `OneTime` -- A OneTime is a billing event which represents a one-time charge
|
||||
or credit to the client (as opposed to Recurring).
|
||||
* `pipeline-*` -- These entities are also generated by the App Engine MapReduce
|
||||
library.
|
||||
* `PollMessage` -- PollMessages are generated by the system to notify registrars
|
||||
of asynchronous responses and status changes.
|
||||
* `PremiumList`, `PremiumListEntry`, and `PremiumListRevision` -- The standard
|
||||
method for determining which domain names receive premium pricing is to
|
||||
maintain a static list of premium names. Each PremiumList contains some number
|
||||
of PremiumListRevisions, each of which in turn contains a PremiumListEntry for
|
||||
each premium name.
|
||||
* `RdeRevision` -- These entities are used by the RDE subsystem in the process
|
||||
of generating files.
|
||||
* `Recurring` -- A Recurring is a billing event which represents a recurring
|
||||
charge to the client (as opposed to OneTime).
|
||||
* `Registrar` -- These hold information about client registrars.
|
||||
* `RegistrarContact` -- Registrars have contacts just as domains do. These are
|
||||
stored in a special RegistrarContact entity.
|
||||
* `RegistrarCredit` and `RegistrarCreditBalance` -- The system supports the
|
||||
concept of a registrar credit balance, which is a pool of credit that the
|
||||
registrar can use to offset amounts they owe. This might come from promotions,
|
||||
for instance. These entities maintain registrars' balances.
|
||||
* `Registry` -- These hold information about the TLDs supported by the Registry
|
||||
system.
|
||||
* `RegistryCursor` -- These entities are the predecessor to the Cursor
|
||||
entities. We are no longer using them, and will be deleting them soon.
|
||||
* `ReservedList` -- Each ReservedList entity represents an entire list of
|
||||
reserved names which cannot be registered. Each TLD can have one or more
|
||||
attached reserved lists.
|
||||
* `ServerSecret` -- this is a single entity containing the secret numbers used
|
||||
for generating tokens such as XSRF tokens.
|
||||
* `SignedMarkRevocationList` -- The entities together contain the Signed Mark
|
||||
Data Revocation List file downloaded from the TMCH MarksDB each day. Each
|
||||
entity contains up to 10,000 rows of the file, so depending on the size of the
|
||||
file, there will be some handful of entities.
|
||||
* `TmchCrl` -- This is a single entity containing ICANN's TMCH CA Certificate
|
||||
Revocation List.
|
||||
|
||||
## Cloud Storage buckets
|
||||
|
||||
The Domain Registry platform uses
|
||||
[Cloud Storage](https://cloud.google.com/storage/) for bulk storage of large
|
||||
flat files that aren't suitable for Datastore. These files include backups, RDE
|
||||
exports, Datastore snapshots (for ingestion into BigQuery), and reports. Each
|
||||
bucket name must be unique across all of Google Cloud Storage, so we use the
|
||||
common recommended pattern of prefixing all buckets with the name of the App
|
||||
Engine app (which is itself globally unique). Most of the bucket names are
|
||||
configurable, but the defaults are as follows, with PROJECT standing in as a
|
||||
placeholder for the App Engine app name:
|
||||
|
||||
* `PROJECT-billing` -- Monthly invoice files for each registrar.
|
||||
* `PROJECT-commits` -- Daily exports of commit logs that are needed for
|
||||
potentially performing a restore.
|
||||
* `PROJECT-domain-lists` -- Daily exports of all registered domain names per
|
||||
TLD.
|
||||
* `PROJECT-gcs-logs` -- This bucket is used at Google to store the GCS access
|
||||
logs and storage data. This bucket is not required by the Registry system,
|
||||
but can provide useful logging information. For instructions on setup, see
|
||||
the
|
||||
[Cloud Storage documentation](https://cloud.google.com/storage/docs/access-logs).
|
||||
* `PROJECT-icann-brda` -- This bucket contains the weekly ICANN BRDA files.
|
||||
There is no lifecycle expiration; we keep a history of all the files. This
|
||||
bucket must exist for the BRDA process to function.
|
||||
* `PROJECT-icann-zfa` -- This bucket contains the most recent ICANN ZFA
|
||||
files. No lifecycle is needed, because the files are overwritten each time.
|
||||
* `PROJECT-rde` -- This bucket contains RDE exports, which should then be
|
||||
regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The
|
||||
bucket must exist.
|
||||
* `PROJECT-reporting` -- Contains monthly ICANN reporting files.
|
||||
* `PROJECT-snapshots` -- Contains daily exports of Datastore entities of types
|
||||
defined in `ExportConstants.java`. These are imported into BigQuery daily to
|
||||
allow for in-depth querying.
|
||||
* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By
|
||||
default, the App Engine MapReduce library places its temporary files in a
|
||||
bucket named {project}.appspot.com. This bucket must exist. To keep temporary
|
||||
files from building up, a 90-day or 180-day lifecycle should be applied to the
|
||||
bucket, depending on how long you want to be able to go back and debug
|
||||
MapReduce problems. At 30 GB per day of generate temporary files, this bucket
|
||||
may be the largest consumer of storage, so only save what you actually use.
|
||||
|
||||
## Commit logs
|
||||
|
||||
## Web.xml
|
||||
|
||||
## Cursors
|
||||
|
|
Loading…
Add table
Reference in a new issue