From 6fc7eb40c65334c6e54aa80df96c7631308d460d Mon Sep 17 00:00:00 2001 From: mcilwain Date: Wed, 3 Aug 2016 13:38:03 -0700 Subject: [PATCH] Add more documentation on cron, Datastore, and Cloud Storage Note that a lot of this is adapted from existing non-Markdown documentation written by Brian. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=129252200 --- docs/app-engine-architecture.md | 165 +++++++++++++++++++++++++++++++- 1 file changed, 162 insertions(+), 3 deletions(-) diff --git a/docs/app-engine-architecture.md b/docs/app-engine-architecture.md index 9fb7a0ced..daf414062 100644 --- a/docs/app-engine-architecture.md +++ b/docs/app-engine-architecture.md @@ -230,7 +230,8 @@ of experience running a production registry using this codebase. All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are specified in `cron.xml` files, with one per environment. There are more tasks that execute in Production than in other environments, because tasks like -uploading RDE dumps are only done for the live system. +uploading RDE dumps are only done for the live system. Cron tasks execute on +the `backend` service. Most cron tasks use the `TldFanoutAction` which is accessed via the `/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on @@ -245,12 +246,170 @@ The reason the `TldFanoutAction` exists is that a lot of tasks need to be done separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to have a single cron entry that will create tasks for all TLDs than to have to specify a separate cron task for each action for each TLD (though that is still -an option). +an option). Task queues also provide retry semantics in the event of transient +failures that a raw cron task does not. This is why there are some tasks that +do not fan out across TLDs that still use `TldFanoutAction` -- it's so that the +tasks retry in the face of transient errors. -## Datastore entities +The full list of URL parameters to `TldFanoutAction` that can be specified in +cron.xml is: +* `endpoint` -- The path of the action that should be executed (see `web.xml`). +* `queue` -- The cron queue to enqueue tasks in. +* `forEachRealTld` -- Specifies that the task should be run in each TLD of type + `REAL`. This can be combined with `forEachTestTld`. +* `forEachTestTld` -- Specifies that the task should be run in each TLD of type + `TEST`. This can be combined with `forEachRealTld`. +* `runInEmpty` -- Specifies that the task should be run globally, i.e. just + once, rather than individually per TLD. This is provided to allow tasks to + retry. It is called "`runInEmpty`" for historical reasons. +* `excludes` -- A list of TLDs to exclude from processing. +* `jitterSeconds` -- The execution of each per-TLD task is delayed by a + different random number of seconds between zero and this max value. + +## Cloud Datastore + +The Domain Registry platform uses +[Cloud Datastore](https://cloud.google.com/appengine/docs/java/datastore/) as +its primary database. Cloud Datastore is a NoSQL document database that +provides automatic horizontal scaling, high performance, and high availability. +All information that is persisted to Cloud Datastore takes the form of Java +classes annotated with `@Entity` that are located in the `model` package. The +[Objectify library](https://cloud.google.com/appengine/docs/java/gettingstarted/using-datastore-objectify) +is used to persist instances of these classes in a format that Datastore +understands. + +A brief overview of the different entity types found in the App Engine Datastore +Viewer may help administrators understand what they are seeing. Note that some +of these entities are part of App Engine tools that are outside of the domain +registry codebase: + +* `\_AE\_*` -- These entities are created by App Engine. +* `\_ah\_SESSION` -- These entities track App Engine client sessions. +* `\_GAE\_MR\_*` -- These entities are generated by App Engine while running + MapReduces. +* `BackupStatus` -- There should only be one of these entities, used to maintain + the state of the backup process. +* `Cancellation` -- A cancellation is a special type of billing event which + represents the cancellation of another billing event such as a OneTime or + Recurring. +* `ClaimsList`, `ClaimsListShard`, and `ClaimsListSingleton` -- These entities + store the TMCH claims list, for use in trademark processing. +* `CommitLog*` -- These entities store the commit log information. +* `ContactResource` -- These hold the ICANN contact information (but not + registrar contacts, who have a separate entity type). +* `Cursor` -- We use Cursor entities to maintain state about daily processes, + remembering which dates have been processed. For instance, for the RDE export, + Cursor entities maintain the date up to which each TLD has been exported. +* `DomainApplicationIndex` -- These hold domain applications received during the + sunrise period. +* `DomainBase` -- These hold the ICANN domain information. +* `DomainRecord` -- These are used during the DNS update process. +* `EntityGroupRoot` -- There is only one EntityGroupRoot entity, which serves as + the Datastore parent of many other entities. +* `EppResourceIndex` -- These entities allow enumeration of EPP resources (such + as domains, hosts and contacts), which would otherwise be difficult to do in + Datastore. +* `ExceptionReportEntity` -- These entities are generated automatically by + ECatcher, a Google-internal logging and debugging tool. Non-Google users + should not encounter these entries. +* `ForeignKeyContactIndex`, `ForeignKeyDomainIndex`, and `ForeignKeyHostIndex` + -- These act as a unique index on contacts, domains and hosts, allowing + transactional lookup by foreign key. +* `HistoryEntry` -- A HistoryEntry is the record of a command which mutated an + EPP resource. It serves as the parent of BillingEvents and PollMessages. +* `HostRecord` -- These are used during the DNS update process. +* `HostResource` -- These hold the ICANN host information. +* `Lock` -- Lock entities are used to control access to a shared resource such + as an App Engine queue. Under ordinary circumstances, these locks will be + cleaned up automatically, and should not accumulate. +* `LogsExportCursor` -- This is a single entity which maintains the state of log + export. +* `MR-*` -- These entities are generated by the App Engine MapReduce library in + the course of running MapReduces. +* `Modification` -- A Modification is a special type of billing event which + represents the modification of a OneTime billing event. +* `OneTime` -- A OneTime is a billing event which represents a one-time charge + or credit to the client (as opposed to Recurring). +* `pipeline-*` -- These entities are also generated by the App Engine MapReduce + library. +* `PollMessage` -- PollMessages are generated by the system to notify registrars + of asynchronous responses and status changes. +* `PremiumList`, `PremiumListEntry`, and `PremiumListRevision` -- The standard + method for determining which domain names receive premium pricing is to + maintain a static list of premium names. Each PremiumList contains some number + of PremiumListRevisions, each of which in turn contains a PremiumListEntry for + each premium name. +* `RdeRevision` -- These entities are used by the RDE subsystem in the process + of generating files. +* `Recurring` -- A Recurring is a billing event which represents a recurring + charge to the client (as opposed to OneTime). +* `Registrar` -- These hold information about client registrars. +* `RegistrarContact` -- Registrars have contacts just as domains do. These are + stored in a special RegistrarContact entity. +* `RegistrarCredit` and `RegistrarCreditBalance` -- The system supports the + concept of a registrar credit balance, which is a pool of credit that the + registrar can use to offset amounts they owe. This might come from promotions, + for instance. These entities maintain registrars' balances. +* `Registry` -- These hold information about the TLDs supported by the Registry + system. +* `RegistryCursor` -- These entities are the predecessor to the Cursor + entities. We are no longer using them, and will be deleting them soon. +* `ReservedList` -- Each ReservedList entity represents an entire list of + reserved names which cannot be registered. Each TLD can have one or more + attached reserved lists. +* `ServerSecret` -- this is a single entity containing the secret numbers used + for generating tokens such as XSRF tokens. +* `SignedMarkRevocationList` -- The entities together contain the Signed Mark + Data Revocation List file downloaded from the TMCH MarksDB each day. Each + entity contains up to 10,000 rows of the file, so depending on the size of the + file, there will be some handful of entities. +* `TmchCrl` -- This is a single entity containing ICANN's TMCH CA Certificate + Revocation List. ## Cloud Storage buckets +The Domain Registry platform uses +[Cloud Storage](https://cloud.google.com/storage/) for bulk storage of large +flat files that aren't suitable for Datastore. These files include backups, RDE +exports, Datastore snapshots (for ingestion into BigQuery), and reports. Each +bucket name must be unique across all of Google Cloud Storage, so we use the +common recommended pattern of prefixing all buckets with the name of the App +Engine app (which is itself globally unique). Most of the bucket names are +configurable, but the defaults are as follows, with PROJECT standing in as a +placeholder for the App Engine app name: + +* `PROJECT-billing` -- Monthly invoice files for each registrar. +* `PROJECT-commits` -- Daily exports of commit logs that are needed for + potentially performing a restore. +* `PROJECT-domain-lists` -- Daily exports of all registered domain names per + TLD. +* `PROJECT-gcs-logs` -- This bucket is used at Google to store the GCS access + logs and storage data. This bucket is not required by the Registry system, + but can provide useful logging information. For instructions on setup, see + the + [Cloud Storage documentation](https://cloud.google.com/storage/docs/access-logs). +* `PROJECT-icann-brda` -- This bucket contains the weekly ICANN BRDA files. + There is no lifecycle expiration; we keep a history of all the files. This + bucket must exist for the BRDA process to function. +* `PROJECT-icann-zfa` -- This bucket contains the most recent ICANN ZFA + files. No lifecycle is needed, because the files are overwritten each time. +* `PROJECT-rde` -- This bucket contains RDE exports, which should then be + regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The + bucket must exist. +* `PROJECT-reporting` -- Contains monthly ICANN reporting files. +* `PROJECT-snapshots` -- Contains daily exports of Datastore entities of types + defined in `ExportConstants.java`. These are imported into BigQuery daily to + allow for in-depth querying. +* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By + default, the App Engine MapReduce library places its temporary files in a + bucket named {project}.appspot.com. This bucket must exist. To keep temporary + files from building up, a 90-day or 180-day lifecycle should be applied to the + bucket, depending on how long you want to be able to go back and debug + MapReduce problems. At 30 GB per day of generate temporary files, this bucket + may be the largest consumer of storage, so only save what you actually use. + +## Commit logs + ## Web.xml ## Cursors