Add more explanation to architecture document

This also renames the document to clarify its scope as being all of Google Cloud Platform, not just App Engine. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=169543846
2025-07-04 02:03:24 +02:00 · 2017-09-21 08:02:15 -07:00 · 2017-09-21 08:02:15 -07:00 · c64e9fe788
commit c64e9fe788
parent e0f432aafb
4 changed files with 87 additions and 72 deletions
--- a/docs/app-engine-architecture.md
+++ b/docs/app-engine-architecture.md
@ -1,12 +1,21 @@
-# App Engine architecture
+# Architecture
-This document contains information on the overall architecture of Nomulus as
+This document contains information on the overall architecture of Nomulus on
-pertains to App Engine.
+[Google Cloud Platform](https://cloud.google.com/). It covers the App Engine
 architecture as well as other Cloud Platform services used by Nomulus.
-## Services
+## App Engine
-Nomulus contains three
+[Google App Engine](https://cloud.google.com/appengine/) is a cloud computing
-[services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine),
+platform that runs web applications in the form of servlets. Nomulus consists of
 Java servlets that process web requests. These servlets use other features
 provided by App Engine, including task queues and cron jobs, as explained
 below.
 ### Services
 Nomulus contains three [App Engine
 services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine),
 which were previously called modules in earlier versions of App Engine. The
 services are: default (also called front-end), backend, and tools. Each service
 runs independently in a lot of ways, including that they can be upgraded
@ -25,7 +34,7 @@ The reason that the dot is escaped rather than forming subdomains is because the
 SSL certificate for `appspot.com` is only valid for `*.appspot.com` (no double
 wild-cards).
-### Default service
+#### Default service
 The default service is responsible for all registrar-facing
 [EPP](https://en.wikipedia.org/wiki/Extensible_Provisioning_Protocol) command
@ -36,7 +45,7 @@ begin to impact users immediately. Requests to the default service are handled
 by the `FrontendServlet`, which provides all of the endpoints exposed in
 `FrontendRequestComponent`.
-### Backend service
+#### Backend service
 The backend service is responsible for executing all regularly scheduled
 background tasks (using cron) as well as all asynchronous tasks. Requests to the
@ -57,7 +66,7 @@ sized to support not just the normal ongoing DNS load but also the load incurred
 by MapReduces, both scheduled (such as RDE) and on-demand (asynchronous
 contact/host deletion).
-### Tools service
+#### Tools service
 The tools service is responsible for servicing requests from the `nomulus`
 command line tool, which provides administrative-level functionality for
@ -74,18 +83,19 @@ tool subcommands like `generate_zone_files` and by manually hitting URLs under
 https://tools-dot-project-id.appspot.com, like
 `/_dr/task/refreshDnsForAllDomains`.
-## Task queues
+### Task queues
-[Task queues](https://cloud.google.com/appengine/docs/java/taskqueue/) in App
+App Engine [task
-Engine provide an asynchronous way to enqueue tasks and then execute them on
+queues](https://cloud.google.com/appengine/docs/java/taskqueue/) provide an
-some kind of schedule. There are two types of queues, push queues and pull
+asynchronous way to enqueue tasks and then execute them on some kind of
-queues. Tasks in push queues are always executing up to some throttlable limit.
+schedule. There are two types of queues, push queues and pull queues. Tasks in
-Tasks in pull queues remain there until the queue is polled by code that is
+push queues are always executing up to some throttlable limit. Tasks in pull
-running for some other reason. Essentially, push queues run their own tasks
+queues remain there until the queue is polled by code that is running for some
-while pull queues just enqueue data that is used by something else.
+other reason. Essentially, push queues run their own tasks while pull queues
-Many other parts of App Engine are implemented using task queues. For example,
+just enqueue data that is used by something else. Many other parts of App Engine
-[App Engine cron](https://cloud.google.com/appengine/docs/java/config/cron) adds
+are implemented using task queues. For example, [App Engine
-tasks to push queues at regularly scheduled intervals, and the [MapReduce
+cron](https://cloud.google.com/appengine/docs/java/config/cron) adds tasks to
 push queues at regularly scheduled intervals, and the [MapReduce
 framework](https://cloud.google.com/appengine/docs/java/dataprocessing/) adds
 tasks for each phase of the MapReduce algorithm.
@ -183,12 +193,60 @@ explicitly marked as otherwise.
    spreadsheet. Tasks are enqueued by `RegistrarServlet` when changes are made
    to registrar fields and are executed by `SyncRegistrarsSheetAction`.
 ### Cron jobs
 Nomulus uses App Engine [cron
 jobs](https://cloud.google.com/appengine/docs/java/config/cron) to run periodic
 scheduled actions. These actions run as frequently as once per minute (in the
 case of syncing DNS updates) or as infrequently as once per month (in the case
 of RDE exports). Cron tasks are specified in `cron.xml` files, with one per
 environment. There are more tasks that run in Production than in other
 environments because tasks like uploading RDE dumps are only done for the live
 system. Cron tasks execute on the `backend` service.
 Most cron tasks use the `TldFanoutAction` which is accessed via the
 `/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
 the backend service, fans out a given cron task for each TLD that exists in the
 registry system, using the queue that is specified in the `cron.xml` entry.
 Because some tasks may be computationally intensive and could risk spiking
 system latency if all start executing immediately at the same time, there is a
 `jitterSeconds` parameter that spreads out tasks over the given number of
 seconds. This is used with DNS updates and commit log deletion.
 The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
 separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
 have a single cron entry that will create tasks for all TLDs than to have to
 specify a separate cron task for each action for each TLD (though that is still
 an option). Task queues also provide retry semantics in the event of transient
 failures that a raw cron task does not. This is why there are some tasks that do
 not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
 tasks retry in the face of transient errors.
 The full list of URL parameters to `TldFanoutAction` that can be specified in
 cron.xml is:
 *   `endpoint` -- The path of the action that should be executed (see
    `web.xml`).
 *   `queue` -- The cron queue to enqueue tasks in.
 *   `forEachRealTld` -- Specifies that the task should be run in each TLD of
    type `REAL`. This can be combined with `forEachTestTld`.
 *   `forEachTestTld` -- Specifies that the task should be run in each TLD of
    type `TEST`. This can be combined with `forEachRealTld`.
 *   `runInEmpty` -- Specifies that the task should be run globally, i.e. just
    once, rather than individually per TLD. This is provided to allow tasks to
    retry. It is called "`runInEmpty`" for historical reasons.
 *   `excludes` -- A list of TLDs to exclude from processing.
 *   `jitterSeconds` -- The execution of each per-TLD task is delayed by a
    different random number of seconds between zero and this max value.
 ## Environments
 Nomulus comes pre-configured with support for a number of different
 environments, all of which are used in Google's registry system. Other registry
 operators may choose to use more or fewer environments, depending on their
-needs.
+needs. Each environment consists of a separate Google Cloud Platform project,
 which includes a separate database and separate bulk storage in Cloud Storage.
 Each environment is thus completely independent.
 The different environments are specified in `RegistryEnvironment`. Most
 correspond to a separate App Engine app except for `UNITTEST` and `LOCAL`, which
@ -243,49 +301,6 @@ of experience running a production registry using this codebase.
    errors, it can be pushed to Production.
 5.  Repeat once weekly, or potentially more often.
 ## Cron tasks
 All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are
 specified in `cron.xml` files, with one per environment. There are more tasks
 that execute in Production than in other environments, because tasks like
 uploading RDE dumps are only done for the live system. Cron tasks execute on the
 `backend` service.
 Most cron tasks use the `TldFanoutAction` which is accessed via the
 `/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
 the backend service, fans out a given cron task for each TLD that exists in the
 registry system, using the queue that is specified in the `cron.xml` entry.
 Because some tasks may be computationally intensive and could risk spiking
 system latency if all start executing immediately at the same time, there is a
 `jitterSeconds` parameter that spreads out tasks over the given number of
 seconds. This is used with DNS updates and commit log deletion.
 The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
 separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
 have a single cron entry that will create tasks for all TLDs than to have to
 specify a separate cron task for each action for each TLD (though that is still
 an option). Task queues also provide retry semantics in the event of transient
 failures that a raw cron task does not. This is why there are some tasks that do
 not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
 tasks retry in the face of transient errors.
 The full list of URL parameters to `TldFanoutAction` that can be specified in
 cron.xml is:
 *   `endpoint` -- The path of the action that should be executed (see
    `web.xml`).
 *   `queue` -- The cron queue to enqueue tasks in.
 *   `forEachRealTld` -- Specifies that the task should be run in each TLD of
    type `REAL`. This can be combined with `forEachTestTld`.
 *   `forEachTestTld` -- Specifies that the task should be run in each TLD of
    type `TEST`. This can be combined with `forEachRealTld`.
 *   `runInEmpty` -- Specifies that the task should be run globally, i.e. just
    once, rather than individually per TLD. This is provided to allow tasks to
    retry. It is called "`runInEmpty`" for historical reasons.
 *   `excludes` -- A list of TLDs to exclude from processing.
 *   `jitterSeconds` -- The execution of each per-TLD task is delayed by a
    different random number of seconds between zero and this max value.
 ## Cloud Datastore
 Nomulus uses [Cloud
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -12,8 +12,8 @@ updated by running `nomulus` commands without having to deploy a new version.
 Here's a checklist of things that need to be configured upon initial
 installation of the project:
-*   Create Google Cloud Storage buckets (see the [App Engine architecture
+*   Create Google Cloud Storage buckets (see the [Architecture
-    guide](./app-engine-architecture.md)).
+    documentation](./architecture.md) for more information).
 *   Modify `ConfigModule.java` and set project-specific settings such as product
    name (see below).
 *   Copy and edit `ProductionRegistryConfigExample.java` with your
@ -28,8 +28,8 @@ different values for different environments. This is especially pronounced in
 the `UNITTEST` and `LOCAL` environments, which don't run on App Engine at all.
 As an example, some timeouts may be long in production and short in unit tests.
-See the [App Engine architecture](./app-engine-architecture.md) documentation
+See the [Architecture documentation](./architecture.md) for more details on
-for more details on environments as used by Nomulus.
+environments as used by Nomulus.
 ## App Engine configuration
--- a/docs/install.md
+++ b/docs/install.md
@ -106,8 +106,8 @@ Cloud Platform. Make sure to choose a good Project ID, as it will be used
 repeatedly in a large number of places. If your company is named Acme, then a
 good Project ID for your production environment would be "acme-registry". Keep
 in mind that project IDs for non-production environments should be suffixed with
-the name of the environment (see the [App Engine architecture
+the name of the environment (see the [Architecture
-guide](./app-engine-architecture.md) for more details). For the purposes of this
+documentation](./architecture.md) for more details). For the purposes of this
 example we'll deploy to the "alpha" environment, which is used for developer
 testing. The Project ID will thus be `acme-registry-alpha`.
--- a/docs/rde-import-usage.md
+++ b/docs/rde-import-usage.md
@ -25,8 +25,8 @@ the [first steps tutorial](./first-steps-tutorial.md).
 ## How to load an escrow file
 First of all, ensure that all of the cloud storage buckets are set up for
-nomulus.  See the [architecture documentation](./app-engine-architecture.md) for
+nomulus. See the [Architecture documentation](./architecture.md) for details.
-details.  The escrow file that will be imported should be uploaded to the
+The escrow file that will be imported should be uploaded to the
 `PROJECT-rde-import` cloud storage bucket. The escrow file should not be
 compressed or encrypted. When launching each mapreduce job, reference the
 absolute path to the file (just the path, not the bucket name) in the `path`