Add more documentation on cron, Datastore, and Cloud Storage

Note that a lot of this is adapted from existing non-Markdown documentation written by Brian.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=129252200
This commit is contained in:
mcilwain 2016-08-03 13:38:03 -07:00 committed by Ben McIlwain
parent 4cf5a7d67b
commit 6fc7eb40c6

View file

@ -230,7 +230,8 @@ of experience running a production registry using this codebase.
All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are
specified in `cron.xml` files, with one per environment. There are more tasks
that execute in Production than in other environments, because tasks like
uploading RDE dumps are only done for the live system.
uploading RDE dumps are only done for the live system. Cron tasks execute on
the `backend` service.
Most cron tasks use the `TldFanoutAction` which is accessed via the
`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
@ -245,12 +246,170 @@ The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
have a single cron entry that will create tasks for all TLDs than to have to
specify a separate cron task for each action for each TLD (though that is still
an option).
an option). Task queues also provide retry semantics in the event of transient
failures that a raw cron task does not. This is why there are some tasks that
do not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
tasks retry in the face of transient errors.
## Datastore entities
The full list of URL parameters to `TldFanoutAction` that can be specified in
cron.xml is:
* `endpoint` -- The path of the action that should be executed (see `web.xml`).
* `queue` -- The cron queue to enqueue tasks in.
* `forEachRealTld` -- Specifies that the task should be run in each TLD of type
`REAL`. This can be combined with `forEachTestTld`.
* `forEachTestTld` -- Specifies that the task should be run in each TLD of type
`TEST`. This can be combined with `forEachRealTld`.
* `runInEmpty` -- Specifies that the task should be run globally, i.e. just
once, rather than individually per TLD. This is provided to allow tasks to
retry. It is called "`runInEmpty`" for historical reasons.
* `excludes` -- A list of TLDs to exclude from processing.
* `jitterSeconds` -- The execution of each per-TLD task is delayed by a
different random number of seconds between zero and this max value.
## Cloud Datastore
The Domain Registry platform uses
[Cloud Datastore](https://cloud.google.com/appengine/docs/java/datastore/) as
its primary database. Cloud Datastore is a NoSQL document database that
provides automatic horizontal scaling, high performance, and high availability.
All information that is persisted to Cloud Datastore takes the form of Java
classes annotated with `@Entity` that are located in the `model` package. The
[Objectify library](https://cloud.google.com/appengine/docs/java/gettingstarted/using-datastore-objectify)
is used to persist instances of these classes in a format that Datastore
understands.
A brief overview of the different entity types found in the App Engine Datastore
Viewer may help administrators understand what they are seeing. Note that some
of these entities are part of App Engine tools that are outside of the domain
registry codebase:
* `\_AE\_*` -- These entities are created by App Engine.
* `\_ah\_SESSION` -- These entities track App Engine client sessions.
* `\_GAE\_MR\_*` -- These entities are generated by App Engine while running
MapReduces.
* `BackupStatus` -- There should only be one of these entities, used to maintain
the state of the backup process.
* `Cancellation` -- A cancellation is a special type of billing event which
represents the cancellation of another billing event such as a OneTime or
Recurring.
* `ClaimsList`, `ClaimsListShard`, and `ClaimsListSingleton` -- These entities
store the TMCH claims list, for use in trademark processing.
* `CommitLog*` -- These entities store the commit log information.
* `ContactResource` -- These hold the ICANN contact information (but not
registrar contacts, who have a separate entity type).
* `Cursor` -- We use Cursor entities to maintain state about daily processes,
remembering which dates have been processed. For instance, for the RDE export,
Cursor entities maintain the date up to which each TLD has been exported.
* `DomainApplicationIndex` -- These hold domain applications received during the
sunrise period.
* `DomainBase` -- These hold the ICANN domain information.
* `DomainRecord` -- These are used during the DNS update process.
* `EntityGroupRoot` -- There is only one EntityGroupRoot entity, which serves as
the Datastore parent of many other entities.
* `EppResourceIndex` -- These entities allow enumeration of EPP resources (such
as domains, hosts and contacts), which would otherwise be difficult to do in
Datastore.
* `ExceptionReportEntity` -- These entities are generated automatically by
ECatcher, a Google-internal logging and debugging tool. Non-Google users
should not encounter these entries.
* `ForeignKeyContactIndex`, `ForeignKeyDomainIndex`, and `ForeignKeyHostIndex`
-- These act as a unique index on contacts, domains and hosts, allowing
transactional lookup by foreign key.
* `HistoryEntry` -- A HistoryEntry is the record of a command which mutated an
EPP resource. It serves as the parent of BillingEvents and PollMessages.
* `HostRecord` -- These are used during the DNS update process.
* `HostResource` -- These hold the ICANN host information.
* `Lock` -- Lock entities are used to control access to a shared resource such
as an App Engine queue. Under ordinary circumstances, these locks will be
cleaned up automatically, and should not accumulate.
* `LogsExportCursor` -- This is a single entity which maintains the state of log
export.
* `MR-*` -- These entities are generated by the App Engine MapReduce library in
the course of running MapReduces.
* `Modification` -- A Modification is a special type of billing event which
represents the modification of a OneTime billing event.
* `OneTime` -- A OneTime is a billing event which represents a one-time charge
or credit to the client (as opposed to Recurring).
* `pipeline-*` -- These entities are also generated by the App Engine MapReduce
library.
* `PollMessage` -- PollMessages are generated by the system to notify registrars
of asynchronous responses and status changes.
* `PremiumList`, `PremiumListEntry`, and `PremiumListRevision` -- The standard
method for determining which domain names receive premium pricing is to
maintain a static list of premium names. Each PremiumList contains some number
of PremiumListRevisions, each of which in turn contains a PremiumListEntry for
each premium name.
* `RdeRevision` -- These entities are used by the RDE subsystem in the process
of generating files.
* `Recurring` -- A Recurring is a billing event which represents a recurring
charge to the client (as opposed to OneTime).
* `Registrar` -- These hold information about client registrars.
* `RegistrarContact` -- Registrars have contacts just as domains do. These are
stored in a special RegistrarContact entity.
* `RegistrarCredit` and `RegistrarCreditBalance` -- The system supports the
concept of a registrar credit balance, which is a pool of credit that the
registrar can use to offset amounts they owe. This might come from promotions,
for instance. These entities maintain registrars' balances.
* `Registry` -- These hold information about the TLDs supported by the Registry
system.
* `RegistryCursor` -- These entities are the predecessor to the Cursor
entities. We are no longer using them, and will be deleting them soon.
* `ReservedList` -- Each ReservedList entity represents an entire list of
reserved names which cannot be registered. Each TLD can have one or more
attached reserved lists.
* `ServerSecret` -- this is a single entity containing the secret numbers used
for generating tokens such as XSRF tokens.
* `SignedMarkRevocationList` -- The entities together contain the Signed Mark
Data Revocation List file downloaded from the TMCH MarksDB each day. Each
entity contains up to 10,000 rows of the file, so depending on the size of the
file, there will be some handful of entities.
* `TmchCrl` -- This is a single entity containing ICANN's TMCH CA Certificate
Revocation List.
## Cloud Storage buckets
The Domain Registry platform uses
[Cloud Storage](https://cloud.google.com/storage/) for bulk storage of large
flat files that aren't suitable for Datastore. These files include backups, RDE
exports, Datastore snapshots (for ingestion into BigQuery), and reports. Each
bucket name must be unique across all of Google Cloud Storage, so we use the
common recommended pattern of prefixing all buckets with the name of the App
Engine app (which is itself globally unique). Most of the bucket names are
configurable, but the defaults are as follows, with PROJECT standing in as a
placeholder for the App Engine app name:
* `PROJECT-billing` -- Monthly invoice files for each registrar.
* `PROJECT-commits` -- Daily exports of commit logs that are needed for
potentially performing a restore.
* `PROJECT-domain-lists` -- Daily exports of all registered domain names per
TLD.
* `PROJECT-gcs-logs` -- This bucket is used at Google to store the GCS access
logs and storage data. This bucket is not required by the Registry system,
but can provide useful logging information. For instructions on setup, see
the
[Cloud Storage documentation](https://cloud.google.com/storage/docs/access-logs).
* `PROJECT-icann-brda` -- This bucket contains the weekly ICANN BRDA files.
There is no lifecycle expiration; we keep a history of all the files. This
bucket must exist for the BRDA process to function.
* `PROJECT-icann-zfa` -- This bucket contains the most recent ICANN ZFA
files. No lifecycle is needed, because the files are overwritten each time.
* `PROJECT-rde` -- This bucket contains RDE exports, which should then be
regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The
bucket must exist.
* `PROJECT-reporting` -- Contains monthly ICANN reporting files.
* `PROJECT-snapshots` -- Contains daily exports of Datastore entities of types
defined in `ExportConstants.java`. These are imported into BigQuery daily to
allow for in-depth querying.
* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By
default, the App Engine MapReduce library places its temporary files in a
bucket named {project}.appspot.com. This bucket must exist. To keep temporary
files from building up, a 90-day or 180-day lifecycle should be applied to the
bucket, depending on how long you want to be able to go back and debug
MapReduce problems. At 30 GB per day of generate temporary files, this bucket
may be the largest consumer of storage, so only save what you actually use.
## Commit logs
## Web.xml
## Cursors