From c64e9fe788476dd9c0067aa721f1c9d4ae3a1edd Mon Sep 17 00:00:00 2001 From: mcilwain Date: Thu, 21 Sep 2017 08:02:15 -0700 Subject: [PATCH] Add more explanation to architecture document This also renames the document to clarify its scope as being all of Google Cloud Platform, not just App Engine. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=169543846 --- ...engine-architecture.md => architecture.md} | 143 ++++++++++-------- docs/configuration.md | 8 +- docs/install.md | 4 +- docs/rde-import-usage.md | 4 +- 4 files changed, 87 insertions(+), 72 deletions(-) rename docs/{app-engine-architecture.md => architecture.md} (91%) diff --git a/docs/app-engine-architecture.md b/docs/architecture.md similarity index 91% rename from docs/app-engine-architecture.md rename to docs/architecture.md index 535b924a0..4e3c839ac 100644 --- a/docs/app-engine-architecture.md +++ b/docs/architecture.md @@ -1,12 +1,21 @@ -# App Engine architecture +# Architecture -This document contains information on the overall architecture of Nomulus as -pertains to App Engine. +This document contains information on the overall architecture of Nomulus on +[Google Cloud Platform](https://cloud.google.com/). It covers the App Engine +architecture as well as other Cloud Platform services used by Nomulus. -## Services +## App Engine -Nomulus contains three -[services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine), +[Google App Engine](https://cloud.google.com/appengine/) is a cloud computing +platform that runs web applications in the form of servlets. Nomulus consists of +Java servlets that process web requests. These servlets use other features +provided by App Engine, including task queues and cron jobs, as explained +below. + +### Services + +Nomulus contains three [App Engine +services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine), which were previously called modules in earlier versions of App Engine. The services are: default (also called front-end), backend, and tools. Each service runs independently in a lot of ways, including that they can be upgraded @@ -25,7 +34,7 @@ The reason that the dot is escaped rather than forming subdomains is because the SSL certificate for `appspot.com` is only valid for `*.appspot.com` (no double wild-cards). -### Default service +#### Default service The default service is responsible for all registrar-facing [EPP](https://en.wikipedia.org/wiki/Extensible_Provisioning_Protocol) command @@ -36,7 +45,7 @@ begin to impact users immediately. Requests to the default service are handled by the `FrontendServlet`, which provides all of the endpoints exposed in `FrontendRequestComponent`. -### Backend service +#### Backend service The backend service is responsible for executing all regularly scheduled background tasks (using cron) as well as all asynchronous tasks. Requests to the @@ -57,7 +66,7 @@ sized to support not just the normal ongoing DNS load but also the load incurred by MapReduces, both scheduled (such as RDE) and on-demand (asynchronous contact/host deletion). -### Tools service +#### Tools service The tools service is responsible for servicing requests from the `nomulus` command line tool, which provides administrative-level functionality for @@ -74,18 +83,19 @@ tool subcommands like `generate_zone_files` and by manually hitting URLs under https://tools-dot-project-id.appspot.com, like `/_dr/task/refreshDnsForAllDomains`. -## Task queues +### Task queues -[Task queues](https://cloud.google.com/appengine/docs/java/taskqueue/) in App -Engine provide an asynchronous way to enqueue tasks and then execute them on -some kind of schedule. There are two types of queues, push queues and pull -queues. Tasks in push queues are always executing up to some throttlable limit. -Tasks in pull queues remain there until the queue is polled by code that is -running for some other reason. Essentially, push queues run their own tasks -while pull queues just enqueue data that is used by something else. -Many other parts of App Engine are implemented using task queues. For example, -[App Engine cron](https://cloud.google.com/appengine/docs/java/config/cron) adds -tasks to push queues at regularly scheduled intervals, and the [MapReduce +App Engine [task +queues](https://cloud.google.com/appengine/docs/java/taskqueue/) provide an +asynchronous way to enqueue tasks and then execute them on some kind of +schedule. There are two types of queues, push queues and pull queues. Tasks in +push queues are always executing up to some throttlable limit. Tasks in pull +queues remain there until the queue is polled by code that is running for some +other reason. Essentially, push queues run their own tasks while pull queues +just enqueue data that is used by something else. Many other parts of App Engine +are implemented using task queues. For example, [App Engine +cron](https://cloud.google.com/appengine/docs/java/config/cron) adds tasks to +push queues at regularly scheduled intervals, and the [MapReduce framework](https://cloud.google.com/appengine/docs/java/dataprocessing/) adds tasks for each phase of the MapReduce algorithm. @@ -183,12 +193,60 @@ explicitly marked as otherwise. spreadsheet. Tasks are enqueued by `RegistrarServlet` when changes are made to registrar fields and are executed by `SyncRegistrarsSheetAction`. +### Cron jobs + +Nomulus uses App Engine [cron +jobs](https://cloud.google.com/appengine/docs/java/config/cron) to run periodic +scheduled actions. These actions run as frequently as once per minute (in the +case of syncing DNS updates) or as infrequently as once per month (in the case +of RDE exports). Cron tasks are specified in `cron.xml` files, with one per +environment. There are more tasks that run in Production than in other +environments because tasks like uploading RDE dumps are only done for the live +system. Cron tasks execute on the `backend` service. + +Most cron tasks use the `TldFanoutAction` which is accessed via the +`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on +the backend service, fans out a given cron task for each TLD that exists in the +registry system, using the queue that is specified in the `cron.xml` entry. +Because some tasks may be computationally intensive and could risk spiking +system latency if all start executing immediately at the same time, there is a +`jitterSeconds` parameter that spreads out tasks over the given number of +seconds. This is used with DNS updates and commit log deletion. + +The reason the `TldFanoutAction` exists is that a lot of tasks need to be done +separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to +have a single cron entry that will create tasks for all TLDs than to have to +specify a separate cron task for each action for each TLD (though that is still +an option). Task queues also provide retry semantics in the event of transient +failures that a raw cron task does not. This is why there are some tasks that do +not fan out across TLDs that still use `TldFanoutAction` -- it's so that the +tasks retry in the face of transient errors. + +The full list of URL parameters to `TldFanoutAction` that can be specified in +cron.xml is: + +* `endpoint` -- The path of the action that should be executed (see + `web.xml`). +* `queue` -- The cron queue to enqueue tasks in. +* `forEachRealTld` -- Specifies that the task should be run in each TLD of + type `REAL`. This can be combined with `forEachTestTld`. +* `forEachTestTld` -- Specifies that the task should be run in each TLD of + type `TEST`. This can be combined with `forEachRealTld`. +* `runInEmpty` -- Specifies that the task should be run globally, i.e. just + once, rather than individually per TLD. This is provided to allow tasks to + retry. It is called "`runInEmpty`" for historical reasons. +* `excludes` -- A list of TLDs to exclude from processing. +* `jitterSeconds` -- The execution of each per-TLD task is delayed by a + different random number of seconds between zero and this max value. + ## Environments Nomulus comes pre-configured with support for a number of different environments, all of which are used in Google's registry system. Other registry operators may choose to use more or fewer environments, depending on their -needs. +needs. Each environment consists of a separate Google Cloud Platform project, +which includes a separate database and separate bulk storage in Cloud Storage. +Each environment is thus completely independent. The different environments are specified in `RegistryEnvironment`. Most correspond to a separate App Engine app except for `UNITTEST` and `LOCAL`, which @@ -243,49 +301,6 @@ of experience running a production registry using this codebase. errors, it can be pushed to Production. 5. Repeat once weekly, or potentially more often. -## Cron tasks - -All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are -specified in `cron.xml` files, with one per environment. There are more tasks -that execute in Production than in other environments, because tasks like -uploading RDE dumps are only done for the live system. Cron tasks execute on the -`backend` service. - -Most cron tasks use the `TldFanoutAction` which is accessed via the -`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on -the backend service, fans out a given cron task for each TLD that exists in the -registry system, using the queue that is specified in the `cron.xml` entry. -Because some tasks may be computationally intensive and could risk spiking -system latency if all start executing immediately at the same time, there is a -`jitterSeconds` parameter that spreads out tasks over the given number of -seconds. This is used with DNS updates and commit log deletion. - -The reason the `TldFanoutAction` exists is that a lot of tasks need to be done -separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to -have a single cron entry that will create tasks for all TLDs than to have to -specify a separate cron task for each action for each TLD (though that is still -an option). Task queues also provide retry semantics in the event of transient -failures that a raw cron task does not. This is why there are some tasks that do -not fan out across TLDs that still use `TldFanoutAction` -- it's so that the -tasks retry in the face of transient errors. - -The full list of URL parameters to `TldFanoutAction` that can be specified in -cron.xml is: - -* `endpoint` -- The path of the action that should be executed (see - `web.xml`). -* `queue` -- The cron queue to enqueue tasks in. -* `forEachRealTld` -- Specifies that the task should be run in each TLD of - type `REAL`. This can be combined with `forEachTestTld`. -* `forEachTestTld` -- Specifies that the task should be run in each TLD of - type `TEST`. This can be combined with `forEachRealTld`. -* `runInEmpty` -- Specifies that the task should be run globally, i.e. just - once, rather than individually per TLD. This is provided to allow tasks to - retry. It is called "`runInEmpty`" for historical reasons. -* `excludes` -- A list of TLDs to exclude from processing. -* `jitterSeconds` -- The execution of each per-TLD task is delayed by a - different random number of seconds between zero and this max value. - ## Cloud Datastore Nomulus uses [Cloud diff --git a/docs/configuration.md b/docs/configuration.md index f0249d147..7e6e717e8 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -12,8 +12,8 @@ updated by running `nomulus` commands without having to deploy a new version. Here's a checklist of things that need to be configured upon initial installation of the project: -* Create Google Cloud Storage buckets (see the [App Engine architecture - guide](./app-engine-architecture.md)). +* Create Google Cloud Storage buckets (see the [Architecture + documentation](./architecture.md) for more information). * Modify `ConfigModule.java` and set project-specific settings such as product name (see below). * Copy and edit `ProductionRegistryConfigExample.java` with your @@ -28,8 +28,8 @@ different values for different environments. This is especially pronounced in the `UNITTEST` and `LOCAL` environments, which don't run on App Engine at all. As an example, some timeouts may be long in production and short in unit tests. -See the [App Engine architecture](./app-engine-architecture.md) documentation -for more details on environments as used by Nomulus. +See the [Architecture documentation](./architecture.md) for more details on +environments as used by Nomulus. ## App Engine configuration diff --git a/docs/install.md b/docs/install.md index 663a89503..752427e06 100644 --- a/docs/install.md +++ b/docs/install.md @@ -106,8 +106,8 @@ Cloud Platform. Make sure to choose a good Project ID, as it will be used repeatedly in a large number of places. If your company is named Acme, then a good Project ID for your production environment would be "acme-registry". Keep in mind that project IDs for non-production environments should be suffixed with -the name of the environment (see the [App Engine architecture -guide](./app-engine-architecture.md) for more details). For the purposes of this +the name of the environment (see the [Architecture +documentation](./architecture.md) for more details). For the purposes of this example we'll deploy to the "alpha" environment, which is used for developer testing. The Project ID will thus be `acme-registry-alpha`. diff --git a/docs/rde-import-usage.md b/docs/rde-import-usage.md index 457c07f81..a953b7d15 100644 --- a/docs/rde-import-usage.md +++ b/docs/rde-import-usage.md @@ -25,8 +25,8 @@ the [first steps tutorial](./first-steps-tutorial.md). ## How to load an escrow file First of all, ensure that all of the cloud storage buckets are set up for -nomulus. See the [architecture documentation](./app-engine-architecture.md) for -details. The escrow file that will be imported should be uploaded to the +nomulus. See the [Architecture documentation](./architecture.md) for details. +The escrow file that will be imported should be uploaded to the `PROJECT-rde-import` cloud storage bucket. The escrow file should not be compressed or encrypted. When launching each mapreduce job, reference the absolute path to the file (just the path, not the bucket name) in the `path`