Use smaller shard size in ClaimsListShardTest

The default production value of 10,000 was unnecessarily large for testing
purposes.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=132441792
This commit is contained in:
mcilwain 2016-09-07 08:58:15 -07:00 committed by Ben McIlwain
parent 4652688585
commit cadf9d4af2
9 changed files with 17 additions and 5 deletions

View file

@ -1,427 +0,0 @@
# App Engine architecture
This document contains information on the overall architecture of the Domain
Registry project as it is implemented in App Engine.
## Services
The Domain Registry contains three
[services](https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine),
which were previously called modules in earlier versions of App Engine. The
services are: default (also called front-end), backend, and tools. Each service
runs independently in a lot of ways, including that they can be upgraded
individually, their log outputs are separate, and their servers and configured
scaling are separate as well.
Once you have your app deployed and running, the default service can be accessed
at `https://project-id.appspot.com`, substituting whatever your App Engine app
is named for "project-id". Note that that is the URL for the production
instance of your app; other environments will have the environment name appended
with a hyphen in the hostname, e.g. `https://project-id-sandbox.appspot.com`.
The URL for the backend service is `https://backend-dot-project-id.appspot.com`
and the URL for the tools service is `https://tools-dot-project-id.appspot.com`.
The reason that the dot is escaped rather than forming subdomains is because the
SSL certificate for `appspot.com` is only valid for `*.appspot.com` (no double
wild-cards).
### Default service
The default service is responsible for all registrar-facing
[EPP](https://en.wikipedia.org/wiki/Extensible_Provisioning_Protocol) command
traffic, all user-facing WHOIS and RDAP traffic, and the admin and registrar web
consoles, and is thus the most important service. If the service has any
problems and goes down or stops servicing requests in a timely manner, it will
begin to impact users immediately. Requests to the default service are handled
by the `FrontendServlet`, which provides all of the endpoints exposed in
`FrontendRequestComponent`.
### Backend service
The backend service is responsible for executing all regularly scheduled
background tasks (using cron) as well as all asynchronous tasks. Requests to
the backend service are handled by the `BackendServlet`, which provides all of
the endpoints exposed in `BackendRequestComponent`. These include tasks for
generating/exporting RDE, syncing the trademark list from TMDB, exporting
backups, writing out DNS updates, handling asynchronous contact and host
deletions, writing out commit logs, exporting metrics to BigQuery, and many
more. Issues in the backend service will not immediately be apparent to end
users, but the longer it is down, the more obvious it will become that
user-visible tasks such as DNS and deletion are not being handled in a timely
manner.
The backend service is also where all MapReduces run, which includes some of the
aforementioned tasks such as RDE and asynchronous resource deletion, as well as
any one-off data migration MapReduces. Consequently, the backend service should
be sized to support not just the normal ongoing DNS load but also the load
incurred by MapReduces, both scheduled (such as RDE) and on-demand (asynchronous
contact/host deletion).
### Tools service
The tools service is responsible for servicing requests from the `registry_tool`
command line tool, which provides administrative-level functionality for
developers and tech support employees of the registry. It is thus the least
critical of the three services. Requests to the tools service are handled by
the `ToolsServlet`, which provides all of the endpoints exposed in
`ToolsRequestComponent`. Some example functionality that this service provides
includes the server-side code to update premium lists, run EPP commands from the
tool, and manually modify contacts/hosts/domains/and other resources. Problems
with the tools service are not visible to users.
## Task queues
[Task queues](https://cloud.google.com/appengine/docs/java/taskqueue/) in App
Engine provide an asynchronous way to enqueue tasks and then execute them on
some kind of schedule. There are two types of queues, push queues and pull
queues. Tasks in push queues are always executing up to some throttlable limit.
Tasks in pull queues remain there indefinitely until the queue is polled by code
that is running for some other reason. Essentially, push queues run their own
tasks while pull queues just enqueue data that is used by something else. Many
other parts of App Engine are implemented using task queues. For example,
[App Engine cron](https://cloud.google.com/appengine/docs/java/config/cron) adds
tasks to push queues at regularly scheduled intervals, and the
[MapReduce framework](https://cloud.google.com/appengine/docs/java/dataprocessing/)
adds tasks for each phase of the MapReduce algorithm.
The Domain Registry project uses a particular pattern of paired push/pull queues
that is worth explaining in detail. Push queues are essential because App
Engine's architecture does not support long-running background processes, and so
push queues are thus the fundamental building block that allows asynchronous and
background execution of code that is not in response to incoming web requests.
However, they also have limitations in that they do not allow batch processing
or grouping. That's where the pull queue comes in. Regularly scheduled tasks
in the push queue will, upon execution, poll the corresponding pull queue for a
specified number of tasks and execute them in a batch. This allows the code to
execute in the background while taking advantage of batch processing.
Particulars on the task queues in use by the Domain Registry project are
specified in the `queue.xml` file. Note that many push queues have a direct
one-to-one correspondence with entries in `cron.xml` because they need to be
fanned-out on a per-TLD or other basis (see the Cron section below for more
explanation). The exact queue that a given cron task will use is passed as the
query string parameter "queue" in the url specification for the cron task.
Here are the task queues in use by the system. All are push queues unless
explicitly marked as otherwise.
* `bigquery-streaming-metrics` -- Queue for metrics that are asynchronously
streamed to BigQuery in the `Metrics` class. Tasks are enqueued during EPP
flows in `EppController`. This means that there is a lag of a few seconds to
a few minutes between when metrics are generated and when they are queryable
in BigQuery, but this is preferable to slowing all EPP flows down and blocking
them on BigQuery streaming.
* `brda` -- Queue for tasks to upload weekly Bulk Registration Data Access
(BRDA) files to a location where they are available to ICANN. The
`RdeStagingReducer` (part of the RDE MapReduce) creates these tasks at the end
of generating an RDE dump.
* `delete-commits` -- Cron queue for tasks to regularly delete commit logs that
are more than thirty days stale. These tasks execute the
`DeleteOldCommitLogsAction`.
* `dns-cron` (cron queue) and `dns-pull` (pull queue) -- A push/pull pair of
queues. Cron regularly enqueues tasks in dns-cron each minute, which are then
executed by `ReadDnsQueueAction`, which leases a batch of tasks from the pull
queue, groups them by TLD, and writes them as a single task to `dns-publish`
to be published to the configured DNS writer for the TLD.
* `dns-publish` -- Queue for batches of DNS updates to be pushed to DNS writers.
* `export-bigquery-poll` -- Queue for tasks to query the success/failure of a
given BigQuery export job. Tasks are enqueued by `BigqueryPollJobAction`.
* `export-commits` -- Queue for tasks to export commit log checkpoints. Tasks
are enqueued by `CommitLogCheckpointAction` (which is run every minute by
cron) and executed by `ExportCommitLogDiffAction`.
* `export-reserved-terms` -- Cron queue for tasks to export the list of reserved
terms for each TLD. The tasks are executed by `ExportReservedTermsAction`.
* `export-snapshot` -- Cron and push queue for tasks to load a Datastore
snapshot that was stored in Google Cloud Storage and export it to BigQuery.
Tasks are enqueued by both cron and `CheckSnapshotServlet` and are executed by
both `ExportSnapshotServlet` and `LoadSnapshotAction`.
* `export-snapshot-poll` -- Queue for tasks to check that a Datastore snapshot
has been successfully uploaded to Google Cloud Storage (this is an
asynchronous background operation that can take an indeterminate amount of
time). Once the snapshot is successfully uploaded, it is imported into
BigQuery. Tasks are enqueued by `ExportSnapshotServlet` and executed by
`CheckSnapshotServlet`.
* `export-snapshot-update-view` -- Queue for tasks to update the BigQuery views
to point to the most recently uploaded snapshot. Tasks are enqueued by
`LoadSnapshotAction` and executed by `UpdateSnapshotViewAction`.
* `flows-async` -- Queue for asynchronous tasks that are enqueued during EPP
command flows. Currently all of these tasks correspond to invocations of any
of the following three MapReduces: `DnsRefreshForHostRenameAction`,
`DeleteHostResourceAction`, or `DeleteContactResourceAction`.
* `group-members-sync` -- Cron queue for tasks to sync registrar contacts (not
domain contacts!) to Google Groups. Tasks are executed by
`SyncGroupMembersAction`.
* `load[0-9]` -- Queues used to load-test the system by `LoadTestAction`. These
queues don't need to exist except when actively running load tests (which is
not recommended on production environments). There are ten of these queues to
provide simple sharding, because the Domain Registry system is capable of
handling significantly more Queries Per Second than the highest throttle limit
available on task queues (which is 500 qps).
* `lordn-claims` and `lordn-sunrise` -- Pull queues for handling LORDN exports.
Tasks are enqueued synchronously during EPP commands depending on whether the
domain name in question has a claims notice ID.
* `marksdb` -- Queue for tasks to verify that an upload to NORDN was
successfully received and verified. These tasks are enqueued by
`NordnUploadAction` following an upload and are executed by
`NordnVerifyAction`.
* `nordn` -- Cron queue used for NORDN exporting. Tasks are executed by
`NordnUploadAction`, which pulls LORDN data from the `lordn-claims` and
`lordn-sunrise` pull queues (above).
* `rde-report` -- Queue for tasks to upload RDE reports to ICANN following
successful upload of full RDE files to the escrow provider. Tasks are
enqueued by `RdeUploadAction` and executed by `RdeReportAction`.
* `rde-upload` -- Cron queue for tasks to upload already-generated RDE files
from Cloud Storage to the escrow provider. Tasks are executed by
`RdeUploadAction`.
* `sheet` -- Queue for tasks to sync registrar updates to a Google Sheets
spreadsheet. Tasks are enqueued by `RegistrarServlet` when changes are made
to registrar fields and are executed by `SyncRegistrarsSheetAction`.
## Environments
The domain registry codebase comes pre-configured with support for a number of
different environments, all of which are used in Google's registry system.
Other registry operators may choose to user more or fewer environments,
depending on their needs.
The different environments are specified in `RegistryEnvironment`. Most
correspond to a separate App Engine app except for `UNITTEST` and `LOCAL`, which
by their nature do not use real environments running in the cloud. The
recommended naming scheme for the App Engine apps that has the best possible
compatibility with the codebase and thus requires the least configuration is to
pick a name for the production app and then suffix it for the other
environments. E.g., if the production app is to be named 'registry-platform',
then the sandbox app would be named 'registry-platform-sandbox'.
The full list of environments supported out-of-the-box, in descending order from
real to not, is:
* `PRODUCTION` -- The real production environment that is actually running live
TLDs. Since the Domain Registry is a shared registry platform, there need
only ever be one of these.
* `SANDBOX` -- A playground environment for external users to test commands in
without the possibility of affecting production data. This is the environment
new registrars go through
[OT&E](https://www.icann.org/resources/unthemed-pages/registry-agmt-appc-e-2001-04-26-en)
in. Sandbox is also useful as a final sanity check to push a new prospective
build to and allow it to "bake" before pushing it to production.
* `QA` -- An internal environment used by business users to play with and sign
off on new features to be released. This environment can be pushed to
frequently and is where manual testers should be spending the majority of
their time.
* `CRASH` -- Another environment similar to QA, except with no expectations of
data preservation. Crash is used for testing of backup/restore (which brings
the entire system down until it is completed) without affecting the QA
environment.
* `ALPHA` -- The developers' playground. Experimental builds are routinely
pushed here in order to test them on a real app running on App Engine. You
may end up wanting multiple environments like Alpha if you regularly
experience contention (i.e. developers being blocked from testing their code
on Alpha because others are already using it).
* `LOCAL` -- A fake environment that is used when running the app locally on a
simulated App Engine instance.
* `UNITTEST` -- A fake environment that is used in unit tests, where everything
in the App Engine stack is simulated or mocked.
## Release process
The following is a recommended release process based on Google's several years
of experience running a production registry using this codebase.
1. Developers write code and associated unit tests verifying that the new code
works properly.
2. New features or potentially risky bug fixes are pushed to Alpha and tested by
the developers before being committed to the source code repository.
3. New builds are cut and first pushed to Sandbox.
4. Once a build has been running successfully in Sandbox for a day with no
errors, it can be pushed to Production.
5. Repeat once weekly, or potentially more often.
## Cron tasks
All [cron tasks](https://cloud.google.com/appengine/docs/java/config/cron) are
specified in `cron.xml` files, with one per environment. There are more tasks
that execute in Production than in other environments, because tasks like
uploading RDE dumps are only done for the live system. Cron tasks execute on
the `backend` service.
Most cron tasks use the `TldFanoutAction` which is accessed via the
`/_dr/cron/fanout` URL path. This action, which is run by the BackendServlet on
the backend service, fans out a given cron task for each TLD that exists in the
registry system, using the queue that is specified in the `cron.xml` entry.
Because some tasks may be computationally intensive and could risk spiking
system latency if all start executing immediately at the same time, there is a
`jitterSeconds` parameter that spreads out tasks over the given number of
seconds. This is used with DNS updates and commit log deletion.
The reason the `TldFanoutAction` exists is that a lot of tasks need to be done
separately for each TLD, such as RDE exports and NORDN uploads. It's simpler to
have a single cron entry that will create tasks for all TLDs than to have to
specify a separate cron task for each action for each TLD (though that is still
an option). Task queues also provide retry semantics in the event of transient
failures that a raw cron task does not. This is why there are some tasks that
do not fan out across TLDs that still use `TldFanoutAction` -- it's so that the
tasks retry in the face of transient errors.
The full list of URL parameters to `TldFanoutAction` that can be specified in
cron.xml is:
* `endpoint` -- The path of the action that should be executed (see `web.xml`).
* `queue` -- The cron queue to enqueue tasks in.
* `forEachRealTld` -- Specifies that the task should be run in each TLD of type
`REAL`. This can be combined with `forEachTestTld`.
* `forEachTestTld` -- Specifies that the task should be run in each TLD of type
`TEST`. This can be combined with `forEachRealTld`.
* `runInEmpty` -- Specifies that the task should be run globally, i.e. just
once, rather than individually per TLD. This is provided to allow tasks to
retry. It is called "`runInEmpty`" for historical reasons.
* `excludes` -- A list of TLDs to exclude from processing.
* `jitterSeconds` -- The execution of each per-TLD task is delayed by a
different random number of seconds between zero and this max value.
## Cloud Datastore
The Domain Registry platform uses
[Cloud Datastore](https://cloud.google.com/appengine/docs/java/datastore/) as
its primary database. Cloud Datastore is a NoSQL document database that
provides automatic horizontal scaling, high performance, and high availability.
All information that is persisted to Cloud Datastore takes the form of Java
classes annotated with `@Entity` that are located in the `model` package. The
[Objectify library](https://cloud.google.com/appengine/docs/java/gettingstarted/using-datastore-objectify)
is used to persist instances of these classes in a format that Datastore
understands.
A brief overview of the different entity types found in the App Engine Datastore
Viewer may help administrators understand what they are seeing. Note that some
of these entities are part of App Engine tools that are outside of the domain
registry codebase:
* `_AE_*` -- These entities are created by App Engine.
* `_ah_SESSION` -- These entities track App Engine client sessions.
* `_GAE_MR_*` -- These entities are generated by App Engine while running
MapReduces.
* `BackupStatus` -- There should only be one of these entities, used to maintain
the state of the backup process.
* `Cancellation` -- A cancellation is a special type of billing event which
represents the cancellation of another billing event such as a OneTime or
Recurring.
* `ClaimsList`, `ClaimsListShard`, and `ClaimsListSingleton` -- These entities
store the TMCH claims list, for use in trademark processing.
* `CommitLog*` -- These entities store the commit log information.
* `ContactResource` -- These hold the ICANN contact information (but not
registrar contacts, who have a separate entity type).
* `Cursor` -- We use Cursor entities to maintain state about daily processes,
remembering which dates have been processed. For instance, for the RDE export,
Cursor entities maintain the date up to which each TLD has been exported.
* `DomainApplicationIndex` -- These hold domain applications received during the
sunrise period.
* `DomainBase` -- These hold the ICANN domain information.
* `DomainRecord` -- These are used during the DNS update process.
* `EntityGroupRoot` -- There is only one EntityGroupRoot entity, which serves as
the Datastore parent of many other entities.
* `EppResourceIndex` -- These entities allow enumeration of EPP resources (such
as domains, hosts and contacts), which would otherwise be difficult to do in
Datastore.
* `ExceptionReportEntity` -- These entities are generated automatically by
ECatcher, a Google-internal logging and debugging tool. Non-Google users
should not encounter these entries.
* `ForeignKeyContactIndex`, `ForeignKeyDomainIndex`, and `ForeignKeyHostIndex`
-- These act as a unique index on contacts, domains and hosts, allowing
transactional lookup by foreign key.
* `HistoryEntry` -- A HistoryEntry is the record of a command which mutated an
EPP resource. It serves as the parent of BillingEvents and PollMessages.
* `HostRecord` -- These are used during the DNS update process.
* `HostResource` -- These hold the ICANN host information.
* `Lock` -- Lock entities are used to control access to a shared resource such
as an App Engine queue. Under ordinary circumstances, these locks will be
cleaned up automatically, and should not accumulate.
* `LogsExportCursor` -- This is a single entity which maintains the state of log
export.
* `MR-*` -- These entities are generated by the App Engine MapReduce library in
the course of running MapReduces.
* `Modification` -- A Modification is a special type of billing event which
represents the modification of a OneTime billing event.
* `OneTime` -- A OneTime is a billing event which represents a one-time charge
or credit to the client (as opposed to Recurring).
* `pipeline-*` -- These entities are also generated by the App Engine MapReduce
library.
* `PollMessage` -- PollMessages are generated by the system to notify registrars
of asynchronous responses and status changes.
* `PremiumList`, `PremiumListEntry`, and `PremiumListRevision` -- The standard
method for determining which domain names receive premium pricing is to
maintain a static list of premium names. Each PremiumList contains some number
of PremiumListRevisions, each of which in turn contains a PremiumListEntry for
each premium name.
* `RdeRevision` -- These entities are used by the RDE subsystem in the process
of generating files.
* `Recurring` -- A Recurring is a billing event which represents a recurring
charge to the client (as opposed to OneTime).
* `Registrar` -- These hold information about client registrars.
* `RegistrarContact` -- Registrars have contacts just as domains do. These are
stored in a special RegistrarContact entity.
* `RegistrarCredit` and `RegistrarCreditBalance` -- The system supports the
concept of a registrar credit balance, which is a pool of credit that the
registrar can use to offset amounts they owe. This might come from promotions,
for instance. These entities maintain registrars' balances.
* `Registry` -- These hold information about the TLDs supported by the Registry
system.
* `RegistryCursor` -- These entities are the predecessor to the Cursor
entities. We are no longer using them, and will be deleting them soon.
* `ReservedList` -- Each ReservedList entity represents an entire list of
reserved names which cannot be registered. Each TLD can have one or more
attached reserved lists.
* `ServerSecret` -- this is a single entity containing the secret numbers used
for generating tokens such as XSRF tokens.
* `SignedMarkRevocationList` -- The entities together contain the Signed Mark
Data Revocation List file downloaded from the TMCH MarksDB each day. Each
entity contains up to 10,000 rows of the file, so depending on the size of the
file, there will be some handful of entities.
* `TmchCrl` -- This is a single entity containing ICANN's TMCH CA Certificate
Revocation List.
## Cloud Storage buckets
The Domain Registry platform uses
[Cloud Storage](https://cloud.google.com/storage/) for bulk storage of large
flat files that aren't suitable for Datastore. These files include backups, RDE
exports, Datastore snapshots (for ingestion into BigQuery), and reports. Each
bucket name must be unique across all of Google Cloud Storage, so we use the
common recommended pattern of prefixing all buckets with the name of the App
Engine app (which is itself globally unique). Most of the bucket names are
configurable, but the defaults are as follows, with PROJECT standing in as a
placeholder for the App Engine app name:
* `PROJECT-billing` -- Monthly invoice files for each registrar.
* `PROJECT-commits` -- Daily exports of commit logs that are needed for
potentially performing a restore.
* `PROJECT-domain-lists` -- Daily exports of all registered domain names per
TLD.
* `PROJECT-gcs-logs` -- This bucket is used at Google to store the GCS access
logs and storage data. This bucket is not required by the Registry system,
but can provide useful logging information. For instructions on setup, see
the
[Cloud Storage documentation](https://cloud.google.com/storage/docs/access-logs).
* `PROJECT-icann-brda` -- This bucket contains the weekly ICANN BRDA files.
There is no lifecycle expiration; we keep a history of all the files. This
bucket must exist for the BRDA process to function.
* `PROJECT-icann-zfa` -- This bucket contains the most recent ICANN ZFA
files. No lifecycle is needed, because the files are overwritten each time.
* `PROJECT-rde` -- This bucket contains RDE exports, which should then be
regularly uploaded to the escrow provider. Lifecycle is set to 90 days. The
bucket must exist.
* `PROJECT-reporting` -- Contains monthly ICANN reporting files.
* `PROJECT-snapshots` -- Contains daily exports of Datastore entities of types
defined in `ExportConstants.java`. These are imported into BigQuery daily to
allow for in-depth querying.
* `PROJECT.appspot.com` -- Temporary MapReduce files are stored here. By
default, the App Engine MapReduce library places its temporary files in a
bucket named {project}.appspot.com. This bucket must exist. To keep temporary
files from building up, a 90-day or 180-day lifecycle should be applied to the
bucket, depending on how long you want to be able to go back and debug
MapReduce problems. At 30 GB per day of generate temporary files, this bucket
may be the largest consumer of storage, so only save what you actually use.
## Commit logs
## Web.xml
## Cursors

View file

@ -1,3 +0,0 @@
# Code structure
An overall look at the structure of the Domain Registry code.

View file

@ -1,122 +0,0 @@
# Configuration
There are multiple different kinds of configuration that go into getting a
working registry system up and running. Broadly speaking, configuration works
in two ways -- globally, for the entire sytem, and per-TLD. Global
configuration is managed by editing code and deploying a new version, whereas
per-TLD configuration is data that lives in Datastore in `Registry` entities,
and is updated by running `registry_tool` commands without having to deploy a
new version.
## Environments
Before getting into the details of configuration, it's important to note that a
lot of configuration is environment-dependent. It is common to see `switch`
statements that operate on the current `RegistryEnvironment`, and return
different values for different environments. This is especially pronounced in
the `UNITTEST` and `LOCAL` environments, which don't run on App Engine at all.
As an example, some timeouts may be long in production and short in unit tests.
See the "App Engine architecture" documentation for more details on environments
as used in the domain registry.
## App Engine configuration
App Engine configuration isn't covered in depth in this document as it is
thoroughly documented in the [App Engine configuration docs][app-engine-config].
The main files of note that come pre-configured along with the domain registry
are:
* `cron.xml` -- Configuration of cronjobs
* `web.xml` -- Configuration of URL paths on the webserver
* `appengine-web.xml` -- Overall App Engine settings including number and type
of instances
* `datastore-indexes.xml` -- Configuration of entity indexes in Datastore
* `queue.xml` -- Configuration of App Engine task queues
* `application.xml` -- Configuration of the application name and its services
Cron, web, and queue are covered in more detail in the "App Engine architecture"
doc, and the rest are covered in the general App Engine documentation.
If you are not writing new code to implement custom features, is unlikely that
you will need to make any modifications beyond simple changes to
`application.xml` and `appengine-web.xml`. If you are writing new features,
it's likely you'll need to add cronjobs, URL paths, Datastore indexes, and task
queues, and thus edit those associated XML files.
## Global configuration
There are two different mechanisms by which global configuration is managed:
`RegistryConfig` (the old way) and `ConfigModule` (the new way). Ideally there
would just be one, but the required code cleanup that hasn't been completed yet.
If you are adding new options, prefer adding them to `ConfigModule`.
**`RegistryConfig`** is an interface, of which you write an implementing class
containing the configuration values. `RegistryConfigLoader` is the class that
provides the instance of `RegistryConfig`, and defaults to returning
`ProductionRegistryConfigExample`. In order to create a configuration specific
to your registry, we recommend copying the `ProductionRegistryConfigExample`
class to a new class that will not be shared publicly, setting the
`com.google.domain.registry.config` system property in `appengine-web.xml` to
the fully qualified class name of that new class so that `RegistryConfigLoader`
will load it instead, and then editing said new class to add your specific
configuration options.
The `RegistryConfig` class has documentation on all of the methods that should
be sufficient to explain what each option is, and
`ProductionRegistryConfigExample` provides an example value for each one. Some
example configuration options in this interface include the App Engine project
ID, the number of days to retain commit logs, the names of various Cloud Storage
bucket names, and URLs for some required services both external and internal.
**`ConfigModule`** is a Dagger module that provides injectable configuration
options (some of which come from `RegistryConfig` above, but most of which do
not). This is preferred over `RegistryConfig` for new configuration options
because being able to inject configuration options is a nicer pattern that makes
for cleaner code. Some configuration options that can be changed in this class
include timeout lengths and buffer sizes for various tasks, email addresses and
URLs to use for various services, more Cloud Storage bucket names, and WHOIS
disclaimer text.
## Sensitive global configuration
Some configuration values, such as PGP private keys, are so sensitive that they
should not be written in code as per the configuration methods above, as that
would pose too high a risk of them accidentally being leaked, e.g. in a source
control mishap. We use a secret store to persist these values in a secure
manner, and abstract access to them using the `Keyring` interface.
The `Keyring` interface contains methods for all sensitive configuration values,
which are primarily credentials used to access various ICANN and ICANN-
affiliated services (such as RDE). These values are only needed for real
production registries and PDT environments. If you are just playing around with
the platform at first, it is OK to put off defining these values until
necessary. To that end, a `DummyKeyringModule` is included that simply provides
an `InMemoryKeyring` populated with dummy values for all secret keys. This
allows the codebase to compile and run, but of course any actions that attempt
to connect to external services will fail because none of the keys are real.
To configure a production registry system, you will need to write a replacement
module for `DummyKeyringModule` that loads the credentials in a secure way, and
provides them using either an instance of `InMemoryKeyring` or your own custom
implementation of `Keyring`. You then need to replace all usages of
`DummyKeyringModule` with your own module in all of the per-service components
in which it is referenced. The functions in `PgpHelper` will likely prove
useful for loading keys stored in PGP format into the PGP key classes that
you'll need to provide from `Keyring`, and you can see examples of them in
action in `DummyKeyringModule`.
## Per-TLD configuration
`Registry` entities, which are persisted to Datastore, are used for per-TLD
configuration. They contain any kind of configuration that is specific to a
TLD, such as the create/renew price of a domain name, the pricing engine
implementation, the DNS writer implementation, whether escrow exports are
enabled, the default currency, the reserved label lists, and more. The
`update_tld` command in `registry_tool` is used to set all of these options.
See the "Registry tool" documentation for more information, as well as the
command-line help for the `update_tld` command. Unlike global configuration
above, per-TLD configuration options are stored as data in the running system,
and thus do not require code pushes to update.
[app-engine-config]: https://cloud.google.com/appengine/docs/java/configuration-files

View file

@ -1,4 +0,0 @@
# Developing
Advice on how to do development on the Domain Registry codebase, including how
to set up an IDE environment and run tests.

View file

@ -1,3 +0,0 @@
# Extension points
The various places the system can be extended by plugging in additional code.

View file

@ -1,248 +0,0 @@
# Installation
Information on how to download and install the Domain Registry project and get a
working running instance.
## Prerequisites
* A recent version of the
[Java 7 JDK](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html)
(note that Java 8 support should be coming to App Engine soon).
* [Bazel](http://bazel.io/), which is the buld system that
the Domain Registry project uses. The minimum required version is 0.3.1.
* [Google App Engine SDK for Java](https://cloud.google.com/appengine/downloads#Google_App_Engine_SDK_for_Java),
especially `appcfg`, which is a command-line tool that runs locally that is used
to communicate with the App Engine cloud.
* [Create an application](https://cloud.google.com/appengine/docs/java/quickstart)
on App Engine to deploy to, and set up `appcfg` to connect to it.
## Downloading the code
Start off by grabbing the latest version from the
[Domain Registry project on GitHub](https://github.com/google/domain-registry).
This can be done either by cloning the Git repo (if you expect to make code
changes to contribute back), or simply by downloading the latest release as a
zip file. This guide will cover cloning from Git, but should work almost
identically for downloading the zip file.
$ git clone git@github.com:google/domain-registry.git
Cloning into 'domain-registry'...
[ .. snip .. ]
$ cd domain-registry
$ ls
apiserving CONTRIBUTORS java LICENSE scripts
AUTHORS docs javascript python third_party
CONTRIBUTING.md google javatests README.md WORKSPACE
The most important directories are:
* `docs` -- the documentation (including this install guide)
* `java/google/registry` -- all of the source code of the main project
* `javatests/google/registry` -- all of the tests for the project
* `python` -- Some Python reporting scripts
* `scripts` -- Scripts for configuring development environments
Everything else, especially `third_party`, contains dependencies that are used
by the project.
## Building and verifying the code
The first step is to verify that the project successfully builds. This will
also download and install dependencies.
$ bazel --batch build //java{,tests}/google/registry/...
INFO: Found 584 targets...
[ .. snip .. ]
INFO: Elapsed time: 124.433s, Critical Path: 116.92s
There may be some warnings thrown, but if there are no errors, then you are good
to go. Next, run the tests to verify that everything works properly. The tests
can be pretty resource intensive, so experiment with different values of
parameters to optimize between low running time and not slowing down your
computer too badly.
$ nice bazel --batch test //javatests/google/registry/... \
--local_resources=1000,3,1.0
Executed 360 out of 360 tests: 360 tests pass.
## Running a development instance locally
`RegistryTestServer` is a lightweight test server for the registry that is
suitable for running locally for development. It uses local versions of all
Google Cloud Platform dependencies, when available. Correspondingly, its
functionality is limited compared to a Domain Registry instance running on an
actual App Engine instance. To see its command-line parameters, run:
$ bazel run //javatests/google/registry/server -- --help
Then to fire up an instance of the server, run:
$ bazel run //javatests/google/registry/server {your params}
Once it is running, you can interact with it via normal `registry_tool`
commands, or view the registrar console in a web browser by navigating to
http://localhost:8080/registrar .
## Deploying the code
You are going to need to configure a variety of things before a working
installation can be deployed (see the Configuration guide for that). It's
recommended to at least confirm that the default version of the code can be
pushed at all first before diving into that, with the expectation that things
won't work properly until they are configured.
All of the [EAR](https://en.wikipedia.org/wiki/EAR_(file_format)) and
[WAR](https://en.wikipedia.org/wiki/WAR_(file_format)) files for the different
environments, which were built in the previous step, are outputted to the
`bazel-genfiles` directory as follows:
$ (cd bazel-genfiles/java/google/registry && ls *.ear)
registry_alpha.ear registry.ear registry_sandbox.ear
registry_crash.ear registry_local.ear
$ (cd bazel-genfiles/java/google/registry && ls *.war)
mandatory_stuff.war registry_default_local.war
registry_backend_alpha.war registry_default_sandbox.war
registry_backend_crash.war registry_default.war
registry_backend_local.war registry_tools_alpha.war
registry_backend_sandbox.war registry_tools_crash.war
registry_backend.war registry_tools_local.war
registry_default_alpha.war registry_tools_sandbox.war
registry_default_crash.war registry_tools.war
Note that there is one EAR file per environment (production is the one without
an environment in the file name), whereas there is one WAR file per service per
environment, with there being three services in total: default, backend, and
tools.
Then, use `appcfg` to [deploy the WAR files](https://cloud.google.com/appengine/docs/java/tools/uploadinganapp):
$ cd /path/to/downloaded/appengine/app
$ /path/to/appcfg.sh update /path/to/registry_default.war
$ /path/to/appcfg.sh update /path/to/registry_backend.war
$ /path/to/appcfg.sh update /path/to/registry_tools.war
## Creating test entities
Once the code is deployed, the next step is to play around with creating some
entities in the registry, including a TLD, a registrar, a domain, a contact, and
a host. Note: Do this on a non-production environment! All commands below use
`registry_tool` to interact with the running registry system; see the
documentation on `registry_tool` for additional information on it. We'll assume
that all commands below are running in the `alpha` environment; if you named
your environment differently, then use that everywhere that `alpha` appears.
### Create a TLD
Pick the name of a TLD to create. For the purposes of this example we'll use
"example", which conveniently happens to be an ICANN reserved string, meaning
it'll never be created for real on the Internet at large.
$ registry_tool -e alpha create_tld example --roid_suffix EXAMPLE \
--initial_tld_state GENERAL_AVAILABILITY --tld_type TEST
[ ... snip confirmation prompt ... ]
Perform this command? (y/N): y
Updated 1 entities.
The name of the TLD is the main parameter passed to the command. The initial
TLD state is set here to general availability, bypassing sunrise and landrush,
so that domain names can be created immediately in the following steps. The TLD
type is set to `TEST` (the other alternative being `REAL`) for obvious reasons.
`roid_suffix` is the suffix that will be used for repository ids of domains on
the TLD -- it must be all uppercase and a maximum of eight ASCII characters.
ICANN
[recommends](https://www.icann.org/resources/pages/correction-non-compliant-roids-2015-08-26-en)
a unique ROID suffix per TLD. The easiest way to come up with one is to simply
use the entire uppercased TLD string if it is eight characters or fewer, or
abbreviate it in some sensible way down to eight if it is longer. The full repo
id of a domain resource is a hex string followed by the suffix,
e.g. `12F7CDF3-EXAMPLE` for our example TLD.
### Create a registrar
Now we need to create a registrar and give it access to operate on the example
TLD. For the purposes of our example we'll name the registrar "Acme".
$ registry_tool -e alpha create_registrar acme --name 'ACME Corp' \
--registrar_type TEST --password hunter2 \
--icann_referral_email blaine@acme.example --street '123 Fake St' \
--city 'Fakington' --state MA --zip 12345 --cc US --allowed_tlds example
[ ... snip confirmation prompt ... ]
Perform this command? (y/N): y
Updated 1 entities.
Skipping registrar groups creation because only production and sandbox
support it.
In the command above, "acme" is the internal registrar id that is the primary
key used to refer to the registrar. The `name` is the display name that is used
less often, primarily in user interfaces. We again set the type of the resource
here to `TEST`. The `password` is the EPP password that the registrar uses to
log in with. The `icann_referral_email` is the email address associated with
the initial creation of the registrar -- note that the registrar cannot change
it later. The address fields are self-explanatory (note that other parameters
are available for international addresses). The `allowed_tlds` parameter is a
comma-delimited list of TLDs that the registrar has access to, and here is set
to the example TLD.
### Create a contact
Now we want to create a contact, as a contact is required before a domain can be
created. Contacts can be used on any number of domains across any number of
TLDs, and contain the information on who owns or provides technical support for
a TLD. These details will appear in WHOIS queries. Note the `-c` parameter,
which stands for client identifier: This is used on most `registry_tool`
commands, and is used to specify the id of the registrar that the command will
be executed using. Contact, domain, and host creation all work by constructing
an EPP message that is sent to the registry, and EPP commands need to run under
the context of a registrar. The "acme" registrar that was created above is used
for this purpose.
$ registry_tool -e alpha create_contact -c acme --id abcd1234 \
--name 'John Smith' --street '234 Fake St' --city 'North Fakington' \
--state MA --zip 23456 --cc US --email jsmith@e.mail
[ ... snip EPP response ... ]
The `id` is the contact id, and is referenced elsewhere in the system (e.g. when
a domain is created and the admin contact is specified). The `name` is the
display name of the contact, which is usually the name of a company or of a
person. Again, the address fields are required, along with an `email`.
### Create a host
Hosts are used to specify the IP addresses (either v4 or v6) that are associated
with a given nameserver. Note that hosts may either be in-bailiwick (on a TLD
that this registry runs) or out-of-bailiwick. In-bailiwick hosts may
additionally be subordinate (a subdomain of a domain name that is on this
registry). Let's create an out-of-bailiwick nameserver, which is the simplest
type.
$ my_registry_tool -e alpha create_host -c acme --host ns1.google.com
[ ... snip EPP response ... ]
Note that hosts are required to have IP addresses if they are subordinate, and
must not have IP addresses if they are not subordinate. Use the `--addresses`
parameter to set the IP addresses on a host, passing in a comma-delimited list
of IP addresses in either IPv4 or IPv6 format.
### Create a domain
To tie it all together, let's create a domain name that uses the above contact
and host.
$ registry_tool -e alpha create_domain -c acme --domain fake.example \
--admin abcd1234 --tech abcd1234 --registrant abcd1234 \
--nameservers ns1.google.com
[ ... snip EPP response ... ]
Note how the same contact id (from above) is used for the administrative,
technical, and registrant contact. This is quite common on domain names.
To verify that everything worked, let's query the WHOIS information for
fake.example:
$ registry_tool -e alpha whois_query fake.example
[ ... snip WHOIS response ... ]
You should see all of the information in WHOIS that you entered above for the
contact, nameserver, and domain.

View file

@ -1,90 +0,0 @@
# Registry tool
The registry tool is a command-line registry administration tool that is invoked
using the `registry_tool` command. It has the ability to view and change a
large number of things in a running domain registry environment, including
creating registrars, updating premium and reserved lists, running an EPP command
from a given XML file, and performing various backend tasks like re-running RDE
if the most recent export failed. Its code lives inside the tools package
(`java/google/registry/tools`), and is compiled by building the `registry_tool`
target in the Bazel BUILD file in that package.
To build the tool and display its command-line help, execute this command:
$ bazel run //java/google/registry/tool:registry_tool -- --help
For future invocations you should alias the compiled binary in the
`bazel-genfiles/java/google/registry` directory or add it to your path so that
you can run it more easily. The rest of this guide assumes that it has been
aliased to `registry_tool`.
The registry tool is always called with a specific environment to run in using
the -e parameter. This looks like:
$ registry_tool -e production {command name} {command parameters}
To see a list of all available commands along with usage information, run
registry_tool without specifying a command name, e.g.:
$ registry_tool -e alpha
Note that the documentation for the commands comes from JCommander, which parses
metadata contained within the code to yield documentation.
## Tech support commands
There are actually two separate tools, `gtech_tool`, which is a collection of
lower impact commands intended to be used by tech support personnel, and
`registry_tool`, which is a superset of `gtech_tool` that contains additional
commands that are potentially more destructive and can change more aspects of
the system. A full list of `gtech_tool` commands can be found in
`GtechTool.java`, and the additional commands that only `registry_tool` has
access to are in `RegistryTool.java`.
## Local and server-side commands
There are two broad ways that commands are implemented: some that send requests
to `ToolsServlet` to execute the action on the server (these commands implement
`ServerSideCommand`), and others that execute the command locally using the
[Remote API](https://cloud.google.com/appengine/docs/java/tools/remoteapi)
(these commands implement `RemoteApiCommand`). Server-side commands take more
work to implement because they require both a client and a server-side
component, e.g. `CreatePremiumListCommand.java` and
`CreatePremiumListAction.java` respectively for creating a premium list.
However, they are fully capable of doing anything that is possible with App
Engine, including running a large MapReduce, because they execute on the tools
service in the App Engine cloud.
Local commands, by contrast, are easier to implement, because there is only a
local component to write, but they aren't as powerful. A general rule of thumb
for making this determination is to use a local command if possible, or a
server-side command otherwise.
## Common tool patterns
All tools ultimately implement the `Command` interface located in the `tools`
package. If you use an IDE such as Eclipse to view the type hierarchy of that
interface, you'll see all of the commands that exist, as well as how a lot of
them are grouped using sub-interfaces or abstract classes that provide
additional functionality. The most common patterns that are used by a large
number of other tools are:
* **`BigqueryCommand`** -- Provides a connection to BigQuery for tools that need
it.
* **`ConfirmingCommand`** -- Provides the methods `prompt()` and `execute()` to
override. `prompt()` outputs a message (usually what the command is going to
do) and prompts the user to confirm execution of the command, and then
`execute()` actually does it.
* **`EppToolCommand`** -- Commands that work by executing EPP commands against
the server, usually by filling in a template with parameters that were passed
on the command-line.
* **`MutatingEppToolCommand`** -- A sub-class of `EppToolCommand` that provides
a `--dry_run` flag, that, if passed, will display the output from the server
of what the command would've done without actually committing those changes.
* **`GetEppResourceCommand`** -- Gets individual EPP resources from the server
and outputs them.
* **`ListObjectsCommand`** -- Lists all objects of a specific type from the
server and outputs them.
* **`MutatingCommand`** -- Provides a facility to create or update entities in
Datastore, and uses a diff algorithm to display the changes that will be made
before committing them.