From 6fc7eb40c65334c6e54aa80df96c7631308d460d Mon Sep 17 00:00:00 2001
From: mcilwain <mcilwain@google.com>
Date: Wed, 3 Aug 2016 13:38:03 -0700
Subject: [PATCH] Add more documentation on cron, Datastore, and Cloud Storage

Note that a lot of this is adapted from existing non-Markdown documentation written by Brian.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=129252200
---
 docs/app-engine-architecture.md | 165 +++++++++++++++++++++++++++++++-
 1 file changed, 162 insertions(+), 3 deletions(-)

diff --git a/docs/app-engine-architecture.md b/docs/app-engine-architecture.md
index 9fb7a0ced..daf414062 100644
--- a/docs/app-engine-architecture.md
+++ b/docs/app-engine-architecture.md
@@ -230,7 +230,8 @@ of experience running a production registry using this codebase.
 All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are
 specified in `cron.xml` files, with one per environment.  There are more tasks
 that execute in Production than in other environments, because tasks like
-uploading RDE dumps are only done for the live system.
+uploading RDE dumps are only done for the live system.  Cron tasks execute on
+the `backend` service.
 
 Most cron tasks use the `TldFanoutAction` which is accessed via the
 `/_dr/cron/fanout` URL path.  This action, which is run by the BackendServlet on
@@ -245,12 +246,170 @@ The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
 separately for each TLD, such as RDE exports and NORDN uploads.  It's simpler to
 have a single cron entry that will create tasks for all TLDs than to have to
 specify a separate cron task for each action for each TLD (though that is still
-an option).
+an option).  Task queues also provide retry semantics in the event of transient
+failures that a raw cron task does not.  This is why there are some tasks that
+do not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
+tasks retry in the face of transient errors.
 
-## Datastore entities
+The full list of URL parameters to `TldFanoutAction` that can be specified in
+cron.xml is:
+* `endpoint` -- The path of the action that should be executed (see `web.xml`).
+* `queue` -- The cron queue to enqueue tasks in.
+* `forEachRealTld` -- Specifies that the task should be run in each TLD of type
+  `REAL`.  This can be combined with `forEachTestTld`.
+* `forEachTestTld` -- Specifies that the task should be run in each TLD of type
+  `TEST`.  This can be combined with `forEachRealTld`.
+* `runInEmpty` -- Specifies that the task should be run globally, i.e. just
+  once, rather than individually per TLD.  This is provided to allow tasks to
+  retry.  It is called "`runInEmpty`" for historical reasons.
+* `excludes` -- A list of TLDs to exclude from processing.
+* `jitterSeconds` -- The execution of each per-TLD task is delayed by a
+  different random number of seconds between zero and this max value.
+
+## Cloud Datastore
+
+The Domain Registry platform uses
+[Cloud Datastore](https://cloud.google.com/appengine/docs/java/datastore/) as
+its primary database.  Cloud Datastore is a NoSQL document database that
+provides automatic horizontal scaling, high performance, and high availability.
+All information that is persisted to Cloud Datastore takes the form of Java
+classes annotated with `@Entity` that are located in the `model` package.  The
+[Objectify library](https://cloud.google.com/appengine/docs/java/gettingstarted/using-datastore-objectify)
+is used to persist instances of these classes in a format that Datastore
+understands.
+
+A brief overview of the different entity types found in the App Engine Datastore
+Viewer may help administrators understand what they are seeing.  Note that some
+of these entities are part of App Engine tools that are outside of the domain
+registry codebase:
+
+* `\_AE\_*` -- These entities are created by App Engine.
+* `\_ah\_SESSION` -- These entities track App Engine client sessions.
+* `\_GAE\_MR\_*` -- These entities are generated by App Engine while running
+  MapReduces.
+* `BackupStatus` -- There should only be one of these entities, used to maintain
+  the state of the backup process.
+* `Cancellation` -- A cancellation is a special type of billing event which
+  represents the cancellation of another billing event such as a OneTime or
+  Recurring.
+* `ClaimsList`, `ClaimsListShard`, and `ClaimsListSingleton` -- These entities
+  store the TMCH claims list, for use in trademark processing.
+* `CommitLog*` -- These entities store the commit log information.
+* `ContactResource` -- These hold the ICANN contact information (but not
+  registrar contacts, who have a separate entity type).
+* `Cursor` -- We use Cursor entities to maintain state about daily processes,
+  remembering which dates have been processed. For instance, for the RDE export,
+  Cursor entities maintain the date up to which each TLD has been exported.
+* `DomainApplicationIndex` -- These hold domain applications received during the
+  sunrise period.
+* `DomainBase` -- These hold the ICANN domain information.
+* `DomainRecord` -- These are used during the DNS update process.
+* `EntityGroupRoot` -- There is only one EntityGroupRoot entity, which serves as
+  the Datastore parent of many other entities.
+* `EppResourceIndex` -- These entities allow enumeration of EPP resources (such
+  as domains, hosts and contacts), which would otherwise be difficult to do in
+  Datastore.
+* `ExceptionReportEntity` -- These entities are generated automatically by
+  ECatcher, a Google-internal logging and debugging tool. Non-Google users
+  should not encounter these entries.
+* `ForeignKeyContactIndex`, `ForeignKeyDomainIndex`, and `ForeignKeyHostIndex`
+  -- These act as a unique index on contacts, domains and hosts, allowing
+  transactional lookup by foreign key.
+* `HistoryEntry` -- A HistoryEntry is the record of a command which mutated an
+  EPP resource. It serves as the parent of BillingEvents and PollMessages.
+* `HostRecord` -- These are used during the DNS update process.
+* `HostResource` -- These hold the ICANN host information.
+* `Lock` -- Lock entities are used to control access to a shared resource such
+  as an App Engine queue. Under ordinary circumstances, these locks will be
+  cleaned up automatically, and should not accumulate.
+* `LogsExportCursor` -- This is a single entity which maintains the state of log
+  export.
+* `MR-*` -- These entities are generated by the App Engine MapReduce library in
+  the course of running MapReduces.
+* `Modification` -- A Modification is a special type of billing event which
+  represents the modification of a OneTime billing event.
+* `OneTime` -- A OneTime is a billing event which represents a one-time charge
+  or credit to the client (as opposed to Recurring).
+* `pipeline-*` -- These entities are also generated by the App Engine MapReduce
+  library.
+* `PollMessage` -- PollMessages are generated by the system to notify registrars
+  of asynchronous responses and status changes.
+* `PremiumList`, `PremiumListEntry`, and `PremiumListRevision` -- The standard
+  method for determining which domain names receive premium pricing is to
+  maintain a static list of premium names. Each PremiumList contains some number
+  of PremiumListRevisions, each of which in turn contains a PremiumListEntry for
+  each premium name.
+* `RdeRevision` -- These entities are used by the RDE subsystem in the process
+  of generating files.
+* `Recurring` -- A Recurring is a billing event which represents a recurring
+  charge to the client (as opposed to OneTime).
+* `Registrar` -- These hold information about client registrars.
+* `RegistrarContact` -- Registrars have contacts just as domains do. These are
+  stored in a special RegistrarContact entity.
+* `RegistrarCredit` and `RegistrarCreditBalance` -- The system supports the
+  concept of a registrar credit balance, which is a pool of credit that the
+  registrar can use to offset amounts they owe. This might come from promotions,
+  for instance. These entities maintain registrars' balances.
+* `Registry` -- These hold information about the TLDs supported by the Registry
+  system.
+* `RegistryCursor` -- These entities are the predecessor to the Cursor
+  entities. We are no longer using them, and will be deleting them soon.
+* `ReservedList` -- Each ReservedList entity represents an entire list of
+  reserved names which cannot be registered. Each TLD can have one or more
+  attached reserved lists.
+* `ServerSecret` -- this is a single entity containing the secret numbers used
+  for generating tokens such as XSRF tokens.
+* `SignedMarkRevocationList` -- The entities together contain the Signed Mark
+  Data Revocation List file downloaded from the TMCH MarksDB each day. Each
+  entity contains up to 10,000 rows of the file, so depending on the size of the
+  file, there will be some handful of entities.
+* `TmchCrl` -- This is a single entity containing ICANN's TMCH CA Certificate
+  Revocation List.
 
 ## Cloud Storage buckets
 
+The Domain Registry platform uses
+[Cloud Storage](https://cloud.google.com/storage/) for bulk storage of large
+flat files that aren't suitable for Datastore.  These files include backups, RDE
+exports, Datastore snapshots (for ingestion into BigQuery), and reports.  Each
+bucket name must be unique across all of Google Cloud Storage, so we use the
+common recommended pattern of prefixing all buckets with the name of the App
+Engine app (which is itself globally unique).  Most of the bucket names are
+configurable, but the defaults are as follows, with PROJECT standing in as a
+placeholder for the App Engine app name:
+
+* `PROJECT-billing` -- Monthly invoice files for each registrar.
+* `PROJECT-commits` -- Daily exports of commit logs that are needed for
+  potentially performing a restore.
+* `PROJECT-domain-lists` -- Daily exports of all registered domain names per
+  TLD.
+* `PROJECT-gcs-logs` -- This bucket is used at Google to store the GCS access
+  logs and storage data.  This bucket is not required by the Registry system,
+  but can provide useful logging information.  For instructions on setup, see
+  the
+  [Cloud Storage documentation](https://cloud.google.com/storage/docs/access-logs).
+* `PROJECT-icann-brda` -- This bucket contains the weekly ICANN BRDA files.
+  There is no lifecycle expiration; we keep a history of all the files.  This
+  bucket must exist for the BRDA process to function.
+* `PROJECT-icann-zfa` -- This bucket contains the most recent ICANN ZFA
+  files. No lifecycle is needed, because the files are overwritten each time.
+* `PROJECT-rde` -- This bucket contains RDE exports, which should then be
+  regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The
+  bucket must exist.
+* `PROJECT-reporting` -- Contains monthly ICANN reporting files.
+* `PROJECT-snapshots` -- Contains daily exports of Datastore entities of types
+  defined in `ExportConstants.java`.  These are imported into BigQuery daily to
+  allow for in-depth querying.
+* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By
+  default, the App Engine MapReduce library places its temporary files in a
+  bucket named {project}.appspot.com. This bucket must exist. To keep temporary
+  files from building up, a 90-day or 180-day lifecycle should be applied to the
+  bucket, depending on how long you want to be able to go back and debug
+  MapReduce problems. At 30 GB per day of generate temporary files, this bucket
+  may be the largest consumer of storage, so only save what you actually use.
+
+## Commit logs
+
 ## Web.xml
 
 ## Cursors