Add more explanation to architecture document

This also renames the document to clarify its scope as being all of Google Cloud Platform, not just App Engine. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=169543846
2025-07-15 23:45:14 +02:00 · 2017-09-21 08:02:15 -07:00 · 2017-09-21 08:02:15 -07:00 · c64e9fe788
commit c64e9fe788
parent e0f432aafb
4 changed files with 87 additions and 72 deletions
--- a/docs/app-engine-architecture.md
+++ b/docs/app-engine-architecture.md
@ -1,12 +1,21 @@
-# App Engine architecture
+# Architecture

-This document contains information on the overall architecture of Nomulus as
-pertains to App Engine.
+This document contains information on the overall architecture of Nomulus on
+[Google Cloud Platform](https://cloud.google.com/). It covers the App Engine
+architecture as well as other Cloud Platform services used by Nomulus.

-## Services
+## App Engine

-Nomulus contains three
-[services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine),
+[Google App Engine](https://cloud.google.com/appengine/) is a cloud computing
+platform that runs web applications in the form of servlets. Nomulus consists of
+Java servlets that process web requests. These servlets use other features
+provided by App Engine, including task queues and cron jobs, as explained
+below.
+
+### Services
+
+Nomulus contains three [App Engine
+services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine),
 which were previously called modules in earlier versions of App Engine. The
 services are: default (also called front-end), backend, and tools. Each service
 runs independently in a lot of ways, including that they can be upgraded
@ -25,7 +34,7 @@ The reason that the dot is escaped rather than forming subdomains is because the
 SSL certificate for `appspot.com` is only valid for `*.appspot.com` (no double
 wild-cards).

-### Default service
+#### Default service

 The default service is responsible for all registrar-facing
 [EPP](https://en.wikipedia.org/wiki/Extensible_Provisioning_Protocol) command
@ -36,7 +45,7 @@ begin to impact users immediately. Requests to the default service are handled
 by the `FrontendServlet`, which provides all of the endpoints exposed in
 `FrontendRequestComponent`.

-### Backend service
+#### Backend service

 The backend service is responsible for executing all regularly scheduled
 background tasks (using cron) as well as all asynchronous tasks. Requests to the
@ -57,7 +66,7 @@ sized to support not just the normal ongoing DNS load but also the load incurred
 by MapReduces, both scheduled (such as RDE) and on-demand (asynchronous
 contact/host deletion).

-### Tools service
+#### Tools service

 The tools service is responsible for servicing requests from the `nomulus`
 command line tool, which provides administrative-level functionality for
@ -74,18 +83,19 @@ tool subcommands like `generate_zone_files` and by manually hitting URLs under
 https://tools-dot-project-id.appspot.com, like
 `/_dr/task/refreshDnsForAllDomains`.

-## Task queues
+### Task queues

-[Task queues](https://cloud.google.com/appengine/docs/java/taskqueue/) in App
-Engine provide an asynchronous way to enqueue tasks and then execute them on
-some kind of schedule. There are two types of queues, push queues and pull
-queues. Tasks in push queues are always executing up to some throttlable limit.
-Tasks in pull queues remain there until the queue is polled by code that is
-running for some other reason. Essentially, push queues run their own tasks
-while pull queues just enqueue data that is used by something else.
-Many other parts of App Engine are implemented using task queues. For example,
-[App Engine cron](https://cloud.google.com/appengine/docs/java/config/cron) adds
-tasks to push queues at regularly scheduled intervals, and the [MapReduce
+App Engine [task
+queues](https://cloud.google.com/appengine/docs/java/taskqueue/) provide an
+asynchronous way to enqueue tasks and then execute them on some kind of
+schedule. There are two types of queues, push queues and pull queues. Tasks in
+push queues are always executing up to some throttlable limit. Tasks in pull
+queues remain there until the queue is polled by code that is running for some
+other reason. Essentially, push queues run their own tasks while pull queues
+just enqueue data that is used by something else. Many other parts of App Engine
+are implemented using task queues. For example, [App Engine
+cron](https://cloud.google.com/appengine/docs/java/config/cron) adds tasks to
+push queues at regularly scheduled intervals, and the [MapReduce
 framework](https://cloud.google.com/appengine/docs/java/dataprocessing/) adds
 tasks for each phase of the MapReduce algorithm.

@ -183,12 +193,60 @@ explicitly marked as otherwise.
    spreadsheet. Tasks are enqueued by `RegistrarServlet` when changes are made
    to registrar fields and are executed by `SyncRegistrarsSheetAction`.

+### Cron jobs
+
+Nomulus uses App Engine [cron
+jobs](https://cloud.google.com/appengine/docs/java/config/cron) to run periodic
+scheduled actions. These actions run as frequently as once per minute (in the
+case of syncing DNS updates) or as infrequently as once per month (in the case
+of RDE exports). Cron tasks are specified in `cron.xml` files, with one per
+environment. There are more tasks that run in Production than in other
+environments because tasks like uploading RDE dumps are only done for the live
+system. Cron tasks execute on the `backend` service.
+
+Most cron tasks use the `TldFanoutAction` which is accessed via the
+`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
+the backend service, fans out a given cron task for each TLD that exists in the
+registry system, using the queue that is specified in the `cron.xml` entry.
+Because some tasks may be computationally intensive and could risk spiking
+system latency if all start executing immediately at the same time, there is a
+`jitterSeconds` parameter that spreads out tasks over the given number of
+seconds. This is used with DNS updates and commit log deletion.
+
+The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
+separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
+have a single cron entry that will create tasks for all TLDs than to have to
+specify a separate cron task for each action for each TLD (though that is still
+an option). Task queues also provide retry semantics in the event of transient
+failures that a raw cron task does not. This is why there are some tasks that do
+not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
+tasks retry in the face of transient errors.
+
+The full list of URL parameters to `TldFanoutAction` that can be specified in
+cron.xml is:
+
+*   `endpoint` -- The path of the action that should be executed (see
+    `web.xml`).
+*   `queue` -- The cron queue to enqueue tasks in.
+*   `forEachRealTld` -- Specifies that the task should be run in each TLD of
+    type `REAL`. This can be combined with `forEachTestTld`.
+*   `forEachTestTld` -- Specifies that the task should be run in each TLD of
+    type `TEST`. This can be combined with `forEachRealTld`.
+*   `runInEmpty` -- Specifies that the task should be run globally, i.e. just
+    once, rather than individually per TLD. This is provided to allow tasks to
+    retry. It is called "`runInEmpty`" for historical reasons.
+*   `excludes` -- A list of TLDs to exclude from processing.
+*   `jitterSeconds` -- The execution of each per-TLD task is delayed by a
+    different random number of seconds between zero and this max value.
+
 ## Environments

 Nomulus comes pre-configured with support for a number of different
 environments, all of which are used in Google's registry system. Other registry
 operators may choose to use more or fewer environments, depending on their
-needs.
+needs. Each environment consists of a separate Google Cloud Platform project,
+which includes a separate database and separate bulk storage in Cloud Storage.
+Each environment is thus completely independent.

 The different environments are specified in `RegistryEnvironment`. Most
 correspond to a separate App Engine app except for `UNITTEST` and `LOCAL`, which
@ -243,49 +301,6 @@ of experience running a production registry using this codebase.
    errors, it can be pushed to Production.
 5.  Repeat once weekly, or potentially more often.

-## Cron tasks
-
-All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are
-specified in `cron.xml` files, with one per environment. There are more tasks
-that execute in Production than in other environments, because tasks like
-uploading RDE dumps are only done for the live system. Cron tasks execute on the
-`backend` service.
-
-Most cron tasks use the `TldFanoutAction` which is accessed via the
-`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
-the backend service, fans out a given cron task for each TLD that exists in the
-registry system, using the queue that is specified in the `cron.xml` entry.
-Because some tasks may be computationally intensive and could risk spiking
-system latency if all start executing immediately at the same time, there is a
-`jitterSeconds` parameter that spreads out tasks over the given number of
-seconds. This is used with DNS updates and commit log deletion.
-
-The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
-separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
-have a single cron entry that will create tasks for all TLDs than to have to
-specify a separate cron task for each action for each TLD (though that is still
-an option). Task queues also provide retry semantics in the event of transient
-failures that a raw cron task does not. This is why there are some tasks that do
-not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
-tasks retry in the face of transient errors.
-
-The full list of URL parameters to `TldFanoutAction` that can be specified in
-cron.xml is:
-
-*   `endpoint` -- The path of the action that should be executed (see
-    `web.xml`).
-*   `queue` -- The cron queue to enqueue tasks in.
-*   `forEachRealTld` -- Specifies that the task should be run in each TLD of
-    type `REAL`. This can be combined with `forEachTestTld`.
-*   `forEachTestTld` -- Specifies that the task should be run in each TLD of
-    type `TEST`. This can be combined with `forEachRealTld`.
-*   `runInEmpty` -- Specifies that the task should be run globally, i.e. just
-    once, rather than individually per TLD. This is provided to allow tasks to
-    retry. It is called "`runInEmpty`" for historical reasons.
-*   `excludes` -- A list of TLDs to exclude from processing.
-*   `jitterSeconds` -- The execution of each per-TLD task is delayed by a
-    different random number of seconds between zero and this max value.
-
 ## Cloud Datastore

 Nomulus uses [Cloud
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -12,8 +12,8 @@ updated by running `nomulus` commands without having to deploy a new version.
 Here's a checklist of things that need to be configured upon initial
 installation of the project:

-*   Create Google Cloud Storage buckets (see the [App Engine architecture
-    guide](./app-engine-architecture.md)).
+*   Create Google Cloud Storage buckets (see the [Architecture
+    documentation](./architecture.md) for more information).
 *   Modify `ConfigModule.java` and set project-specific settings such as product
    name (see below).
 *   Copy and edit `ProductionRegistryConfigExample.java` with your
@ -28,8 +28,8 @@ different values for different environments. This is especially pronounced in
 the `UNITTEST` and `LOCAL` environments, which don't run on App Engine at all.
 As an example, some timeouts may be long in production and short in unit tests.

-See the [App Engine architecture](./app-engine-architecture.md) documentation
-for more details on environments as used by Nomulus.
+See the [Architecture documentation](./architecture.md) for more details on
+environments as used by Nomulus.

 ## App Engine configuration

--- a/docs/install.md
+++ b/docs/install.md
@ -106,8 +106,8 @@ Cloud Platform. Make sure to choose a good Project ID, as it will be used
 repeatedly in a large number of places. If your company is named Acme, then a
 good Project ID for your production environment would be "acme-registry". Keep
 in mind that project IDs for non-production environments should be suffixed with
-the name of the environment (see the [App Engine architecture
-guide](./app-engine-architecture.md) for more details). For the purposes of this
+the name of the environment (see the [Architecture
+documentation](./architecture.md) for more details). For the purposes of this
 example we'll deploy to the "alpha" environment, which is used for developer
 testing. The Project ID will thus be `acme-registry-alpha`.

--- a/docs/rde-import-usage.md
+++ b/docs/rde-import-usage.md
@ -25,8 +25,8 @@ the [first steps tutorial](./first-steps-tutorial.md).
 ## How to load an escrow file

 First of all, ensure that all of the cloud storage buckets are set up for
-nomulus.  See the [architecture documentation](./app-engine-architecture.md) for
-details.  The escrow file that will be imported should be uploaded to the
+nomulus. See the [Architecture documentation](./architecture.md) for details.
+The escrow file that will be imported should be uploaded to the
 `PROJECT-rde-import` cloud storage bucket. The escrow file should not be
 compressed or encrypted. When launching each mapreduce job, reference the
 absolute path to the file (just the path, not the bucket name) in the `path`