mirror of
https://github.com/google/nomulus.git
synced 2025-04-29 19:47:51 +02:00
* Update terraform files and instructions Update proxy terraform files based on current best practices and allow exclusion of forwarding rules for HTTP endpoints. Specifically: - Add a "public_web_whois" input to allow disabling the public HTTP whois forwarding. - Add "description" fields to all variables. - Move outputs of the top-level module into "outputs.tf". - Auto-reformat using hclfmt.
834 lines
32 KiB
Markdown
834 lines
32 KiB
Markdown
# Proxy Setup Instructions
|
|
|
|
This doc covers procedures to configure, build and deploy the
|
|
[Netty](https://netty.io)-based proxy onto [Kubernetes](https://kubernetes.io)
|
|
clusters. [Google Kubernetes
|
|
Engine](https://cloud.google.com/kubernetes-engine/) is used as deployment
|
|
target. Any kubernetes cluster should in theory work, but the user needs to
|
|
change some dependencies on other GCP features such as Cloud KMS for key
|
|
management and Stackdriver for monitoring.
|
|
|
|
## Overview
|
|
|
|
Nomulus runs on Google App Engine, which only supports HTTP(S) traffic. In order
|
|
to work with [EPP](https://tools.ietf.org/html/rfc5730.html) (TCP port 700) and
|
|
[WHOIS](https://tools.ietf.org/html/rfc3912) (TCP port 43), a proxy is needed to
|
|
relay traffic between clients and Nomulus and do protocol translation.
|
|
|
|
We provide a Netty-based proxy that runs as a standalone service (separate from
|
|
Nomulus) either on a VM or Kubernetes clusters. Deploying to kubernetes is
|
|
recommended as it provides automatic scaling and management for docker
|
|
containers that alleviates much of the pain of running a production service.
|
|
|
|
The procedure described here can be used to set up a production environment, as
|
|
most of the steps only needs to be configured once for each environment.
|
|
However, proper release management (cutting a release, rolling updates, canary
|
|
analysis, reliable rollback, etc) is not covered. The user is advised to use a
|
|
service like [Spinnaker](https://www.spinnaker.io/) for release management.
|
|
|
|
## Detailed Instruction
|
|
|
|
We use [`gcloud`](https://cloud.google.com/sdk/gcloud/) and
|
|
[`terraform`](https://terraform.io) to configure the proxy project on GCP. We
|
|
use [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/) to
|
|
deploy the proxy to the project. Additionally,
|
|
[`gsutil`](https://cloud.google.com/storage/docs/gsutil) is used to create GCS
|
|
bucket for storing the terraform state file. These instructions assume that all
|
|
four tools are installed.
|
|
|
|
### Setup GCP project
|
|
|
|
There are three projects involved:
|
|
|
|
- Nomulus project: the project that hosts Nomulus.
|
|
- Proxy project: the project that hosts this proxy.
|
|
- GCR ([Google Container
|
|
Registry](https://cloud.google.com/container-registry/)) project: the
|
|
project from which the proxy pulls its Docker image.
|
|
|
|
We recommend using the same project for Nomulus and the proxy, so that logs for
|
|
both are collected in the same place and easily accessible. If there are
|
|
multiple Nomulus projects (environments), such as production, sandbox, alpha,
|
|
etc, it is recommended to use just one as the GCR project. This way the same
|
|
proxy images are deployed to each environment, and what is running in production
|
|
is the same image tested in sandbox before.
|
|
|
|
The following document outlines the procedure to setup the proxy for one
|
|
environment.
|
|
|
|
In the proxy project, create a GCS bucket to store the terraform state file:
|
|
|
|
```bash
|
|
$ gsutil config # only if you haven't run gsutil before.
|
|
$ gsutil mb -p <proxy-project> gs://<bucket-name>/
|
|
```
|
|
|
|
### Obtain a domain and SSL certificate
|
|
|
|
The proxy exposes two endpoints, whois.\<yourdomain.tld\> and
|
|
epp.\<yourdomain.tld\>. The base domain \<yourdomain.tld\> needs to be obtained
|
|
from a registrar ([Google Domains](https://domains.google) for example). Nomulus
|
|
operators can also self-allocate a domain in the TLDs under management.
|
|
|
|
[EPP protocol over TCP](https://tools.ietf.org/html/rfc5734) requires a
|
|
client-authenticated SSL connection. The operator of the proxy needs to obtain
|
|
an SSL certificate for domain epp.\<yourdomain.tld\>. [Let's
|
|
Encrypt](https://letsencrypt.org) offers SSL certificate free of charge, but any
|
|
other CA can fill the role.
|
|
|
|
Concatenate the certificate and its private key into one file:
|
|
|
|
```bash
|
|
$ cat <certificate.pem> <private.key> > <combined_secret.pem>
|
|
```
|
|
|
|
The order between the certificate and the private key inside the combined file
|
|
does not matter. However, if the certificate file is chained, i. e. it contains
|
|
not only the certificate for your domain, but also certificates from
|
|
intermediate CAs, these certificates must appear in order. The previous
|
|
certificate's issuer must be the next certificate's subject.
|
|
|
|
The certificate will be encrypted by KMS and uploaded to a GCS bucket. The
|
|
bucket will be created automatically by terraform.
|
|
|
|
### Setup proxy project
|
|
|
|
First setup the [Application Default
|
|
Credential](https://cloud.google.com/docs/authentication/production) locally:
|
|
|
|
```bash
|
|
$ gcloud auth application-default login
|
|
```
|
|
|
|
Login with the account that has "Project Owner" role of all three projects
|
|
mentioned above.
|
|
|
|
Navigate to `proxy/terraform`, create a folder called
|
|
`envs`, and inside it, create a folder for the environment that proxy is
|
|
deployed to ("alpha" for example). Copy `example_config.tf` and `outputs.tf`
|
|
to the environment folder.
|
|
|
|
```bash
|
|
$ cd proxy/terraform
|
|
$ mkdir -p envs/alpha
|
|
$ cp example_config.tf envs/alpha/config.tf
|
|
$ cp outputs.tf envs/alpha
|
|
```
|
|
|
|
Now go to the environment folder, edit the `config.tf` file and replace
|
|
placeholders with actual project and domain names.
|
|
|
|
Run terraform:
|
|
|
|
```bash
|
|
$ cd envs/alpha
|
|
(edit config.tf)
|
|
$ terraform init -upgrade
|
|
$ terraform apply
|
|
```
|
|
|
|
Go over the proposed changes, and answer "yes". Terraform will start configuring
|
|
the projects, including setting up clusters, keyrings, load balancer, etc. This
|
|
takes a couple of minutes.
|
|
|
|
### Setup Nomulus
|
|
|
|
After terraform completes, it outputs some information, among which is the
|
|
client id of the service account created for the proxy. This needs to be added
|
|
to the Nomulus configuration file so that Nomulus accepts traffic from the
|
|
proxy. Edit the following section in
|
|
`java/google/registry/config/files/nomulus-config-<env>.yaml` and redeploy
|
|
Nomulus:
|
|
|
|
```yaml
|
|
oAuth:
|
|
allowedOauthClientIds:
|
|
- <client_id>
|
|
```
|
|
|
|
### Setup nameservers
|
|
|
|
The terraform output (run `terraform output` in the environment folder to show
|
|
it again) also shows the nameservers of the proxy domain (\<yourdomain.tld\>).
|
|
Delegate this domain to these nameservers (through your registrar). If the
|
|
domain is self-allocated by Nomulus, run:
|
|
|
|
```bash
|
|
$ nomulus -e production update_domain <yourdomain.tld> \
|
|
-c <registrar_client_name> -n <nameserver1>,<nameserver2>,...
|
|
```
|
|
|
|
### Setup named ports
|
|
|
|
Unfortunately, terraform currently cannot add named ports on the instance groups
|
|
of the GKE clusters it manages. [Named
|
|
ports](https://cloud.google.com/compute/docs/load-balancing/http/backend-service#named_ports)
|
|
are needed for the load balancer it sets up to route traffic to the proxy. To
|
|
set named ports, in the environment folder, do:
|
|
|
|
```bash
|
|
$ bash ../../update_named_ports.sh
|
|
```
|
|
|
|
### Encrypt the certificate to Cloud KMS
|
|
|
|
With the newly set up Cloud KMS key, encrypt the certificate/key combo file
|
|
created earlier:
|
|
|
|
```bash
|
|
$ gcloud kms encrypt --plaintext-file <combined_secret.pem> \
|
|
--ciphertext-file - --key <key-name> --keyring <keyring-name> --location \
|
|
global | base64 > <combined_secret.pem.enc>
|
|
```
|
|
|
|
This encrypted file is then uploaded to a GCS bucket specified in the
|
|
`config.tf` file.
|
|
|
|
```bash
|
|
$ gsutil cp <combined_secret.pem.enc> gs://<your-certificate-bucket>
|
|
```
|
|
|
|
### Edit proxy config file
|
|
|
|
Proxy configuration files are at `java/google/registry/proxy/config/`. There is
|
|
a default config that provides most values needed to run the proxy, and several
|
|
environment-specific configs for proxy instances that communicate to different
|
|
Nomulus environments. The values specified in the environment-specific file
|
|
override those in the default file.
|
|
|
|
The values that need to be changed include the project name, the Nomulus
|
|
endpoint, encrypted certificate/key combo filename and the GCS bucket it is
|
|
stored in, Cloud KMS keyring and key names, etc. Refer to the default file for
|
|
detailed descriptions on each field.
|
|
|
|
### Upload proxy docker image to GCR
|
|
|
|
Edit the `proxy_push` rule in `java/google/registry/proxy/BUILD` to add the GCR
|
|
project name and the image name to save to. Note that as currently set up, all
|
|
images pushed to GCR will be tagged `bazel` and the GKE deployment object loads
|
|
the image tagged as `bazel`. This is fine for testing, but for production one
|
|
should give images unique tags (also configured in the `proxy_push` rule).
|
|
|
|
To push to GCR, run:
|
|
|
|
```bash
|
|
$ bazel run java/google/registry/proxy:proxy_push
|
|
```
|
|
|
|
### Deploy proxy
|
|
|
|
Terraform by default creates three clusters, in the Americas, EMEA, and APAC,
|
|
respectively. We will have to deploy to each cluster separately. The cluster
|
|
information is shown by `terraform output` as well.
|
|
|
|
Deployment is defined in two files, `proxy-deployment-<env>.yaml` and
|
|
`proxy-service.yaml`. Edit `proxy-deployment-<env>.yaml` for your environment,
|
|
fill in the GCR project name and image name. You can also change the arguments
|
|
in the file to turn on logging, for example. To deploy to a cluster:
|
|
|
|
```bash
|
|
# Get credentials to deploy to a cluster.
|
|
$ gcloud container clusters get-credentials --project <proxy-project> \
|
|
--zone <cluster-zone> <cluster-name>
|
|
|
|
# Deploys environment specific kubernetes objects.
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-deployment-<env>.yaml
|
|
|
|
# Deploys shared kubernetes objects.
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-service.yaml
|
|
```
|
|
|
|
Repeat this for all three clusters.
|
|
|
|
### Afterwork
|
|
|
|
Remember to turn on [Stackdriver
|
|
Monitoring](https://cloud.google.com/monitoring/docs/) for the proxy project as
|
|
we use it to collect metrics from the proxy.
|
|
|
|
You are done! The proxy should be running now. You should store the private key
|
|
safely, or delete it as you now have the encrypted file shipped with the proxy.
|
|
See "Additional Steps" in the appendix for other things to check.
|
|
|
|
## Appendix
|
|
|
|
Here we give detailed instructions on how to configure a GCP project to host the
|
|
proxy manually. We strongly recommend against doing so because it is tedious and
|
|
error-prone. Using Terraform is much easier. The following instructions are for
|
|
educational purpose for readers to understand why we set up the infrastructure
|
|
this way. The Terraform config is essentially a translation of the following
|
|
procedure.
|
|
|
|
### Set default project
|
|
|
|
The proxy can run on its own GCP project, or use the existing project that also
|
|
hosts Nomulus. We recommend initializing the
|
|
[`gcloud`](https://cloud.google.com/sdk/gcloud/) config to use that project as
|
|
default, as it avoids having to provide the `--project` flag for every `gcloud`
|
|
command:
|
|
|
|
```bash
|
|
$ gcloud init
|
|
```
|
|
|
|
Follow the prompt and choose the project you want to deploy the proxy to. You
|
|
can skip picking default region and zones, as we will explicitly create clusters
|
|
in multiple zones to provide geographical redundancy.
|
|
|
|
### Create service account
|
|
|
|
The proxy will run with the credential of a [service
|
|
account](https://cloud.google.com/compute/docs/access/service-accounts). In
|
|
theory it can take advantage of [Application Default
|
|
Credentials](https://cloud.google.com/docs/authentication/production) and use
|
|
the service account that the GCE instance underpinning the GKE cluster uses, but
|
|
we recommend creating a separate service account. With a dedicated service
|
|
account, one can grant permissions only necessary to the proxy. To create a
|
|
service account:
|
|
|
|
```bash
|
|
$ gcloud iam service-accounts create proxy-service-account \
|
|
--display-name "Service account for Nomulus proxy"
|
|
```
|
|
|
|
Generate a `.json` key file for the newly created service account. The key file
|
|
contains the secret necessary to construct credentials of the service account
|
|
and needs to be stored safely (it should be deleted later).
|
|
|
|
```bash
|
|
$ gcloud iam service-accounts keys create proxy-key.json --iam-account \
|
|
<service-account-email>
|
|
```
|
|
|
|
A `proxy-key.json` file will be created inside the current working directory.
|
|
|
|
The `client_id` inside the key file needs to be added to the Nomulus
|
|
configuration file so that Nomulus accepts the OAuth tokens generated for this
|
|
service account. Add its value to
|
|
`java/google/registry/config/files/nomulus-config-<env>.yaml`:
|
|
|
|
```yaml
|
|
oAuth:
|
|
allowedOauthClientIds:
|
|
- <client_id>
|
|
```
|
|
|
|
Redeploy Nomulus for the change to take effect.
|
|
|
|
Also bind the "Logs Writer" and role to the proxy service account so that it can
|
|
write logs to [Stackdriver Logging](https://cloud.google.com/logging/).
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <project-id> \
|
|
--member serviceAccount:<service-accounte-email> \
|
|
--role roles/logging.logWriter
|
|
```
|
|
|
|
### Obtain a domain and SSL certificate
|
|
|
|
A domain is needed (if you do not want to rely on IP addresses) for clients to
|
|
communicate to the proxy. Domains can be purchased from a domain registrar
|
|
([Google Domains](https://domains.google) for example). A Nomulus operator could
|
|
also consider self-allocating a domain under an owned TLD insteadl.
|
|
|
|
An SSL certificate is needed as [EPP over
|
|
TCP](https://tools.ietf.org/html/rfc5734) requires SSL. You can apply for an SSL
|
|
certificate for the domain name you intended to serve as EPP endpoint
|
|
(epp.nic.tld for example) for free from [Let's
|
|
Encrypt](https://letsencrypt.org). For now, you will need to manually renew your
|
|
certificate before it expires.
|
|
|
|
### Create keyring and encrypt the certificate/private key
|
|
|
|
The proxy needs access to both the private key and the certificate. Do *not*
|
|
package them directly with the proxy. Instead, use [Cloud
|
|
KMS](https://cloud.google.com/kms/) to encrypt them, ship the encrypted file
|
|
with the proxy, and call Cloud KMS to decrypt them on the fly. (If you want to
|
|
use another keyring solution, you will have to modify the proxy and implement
|
|
yours)
|
|
|
|
Concatenate the private key file with the certificate. It does not matter which
|
|
file is appended to which. However, if the certificate file is a chained `.pem`
|
|
file, make sure that the certificates appear in order, i. e. the issuer of one
|
|
certificate is the subject of the next certificate:
|
|
|
|
```bash
|
|
$ cat <private-key.key> <chained-certificates.pem> >> ssl-cert-key.pem
|
|
```
|
|
|
|
Create a keyring and a key in Cloud KMS, and use the key to encrypt the combined
|
|
file:
|
|
|
|
```bash
|
|
# create keyring
|
|
$ gcloud kms keyrings create <keyring-name> --location global
|
|
|
|
# create key
|
|
$ gcloud kms keys create <key-name> --purpose encryption --location global \
|
|
--keyring <keyring-name>
|
|
|
|
# encryption using the key
|
|
$ gcloud kms encrypt --plaintext-file ssl-cert-key.pem \
|
|
--ciphertext-file ssl-cert-key.pem.enc \
|
|
--key <key-name> --keyring <keyring-name> --location global
|
|
```
|
|
|
|
A file named `ssl-cert-key.pem.enc` will be created. Upload it to a GCS bucket
|
|
in the proxy project. To create a bucket and upload the file:
|
|
|
|
```bash
|
|
$ gsutil mb -p <proxy-project> gs://<bucket-name>
|
|
$ gustil cp ssl-cert-key.pem.enc gs://<bucket-name>
|
|
```
|
|
|
|
The proxy service account needs the "Cloud KMS CryptoKey Decrypter" role to
|
|
decrypt the file using Cloud KMS:
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <project-id> \
|
|
--member serviceAccount:<service-accounte-email> \
|
|
--role roles/cloudkms.cryptoKeyDecrypter
|
|
```
|
|
|
|
The service account also needs the "Storage Object Viewer" role to retrieve the
|
|
encrypted file from GCS:
|
|
|
|
```bash
|
|
$ gsutil iam ch \
|
|
serviceAccount:<service-account-email>:roles/storage.objectViewer \
|
|
gs://<bucket-name>
|
|
```
|
|
|
|
### Proxy configuration
|
|
|
|
Proxy configuration files are at `java/google/registry/proxy/config/`. There is
|
|
a default config that provides most values needed to run the proxy, and several
|
|
environment-specific configs for proxy instances that communicate to different
|
|
Nomulus environments. The values specified in the environment-specific file
|
|
override those in the default file.
|
|
|
|
The values that need to be changed include the project name, the Nomulus
|
|
endpoint, encrypted certificate/key combo filename (`ssl-cert-key.pem` in the
|
|
above example), Cloud KMS keyring and key names, etc. Refer to the default file
|
|
for detailed descriptions on each field.
|
|
|
|
### Setup Stackdriver for the project
|
|
|
|
The proxy streams metrics to
|
|
[Stackdriver](https://cloud.google.com/stackdriver/). Refer to [Stackdriver
|
|
Monitoring](https://cloud.google.com/monitoring/docs/) documentation on how to
|
|
enable monitoring on the GCP project.
|
|
|
|
The proxy service account needs to have ["Monitoring Metric
|
|
Writer"](https://cloud.google.com/monitoring/access-control#predefined_roles)
|
|
role in order to stream metrics to Stackdriver:
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <project-id> \
|
|
--member serviceAccount:<service-account-email> --role roles/monitoring.metricWriter
|
|
```
|
|
|
|
### Create GKE clusters
|
|
|
|
We recommend creating several clusters in different zones for better
|
|
geographical redundancy and better network performance. For example to have
|
|
clusters in the Americas, EMEA and APAC. It is also a good idea to enable
|
|
[autorepair](https://cloud.google.com/kubernetes-engine/docs/concepts/node-auto-repair),
|
|
[autoupgrade](https://cloud.google.com/kubernetes-engine/docs/concepts/node-auto-upgrades),
|
|
and
|
|
[autoscaling](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler)
|
|
on the clusters.
|
|
|
|
The default Kubernetes version on GKE is usually old, consider specifying a
|
|
newer version when creating the cluster, to save time upgrading the nodes
|
|
immediately after.
|
|
|
|
```bash
|
|
$ gcloud container clusters create proxy-americas-cluster --enable-autorepair \
|
|
--enable-autoupgrade --enable-autoscaling --max-nodes=3 --min-nodes=1 \
|
|
--zone=us-east1-c --cluster-version=1.9.4-gke.1 --tags=proxy-cluster \
|
|
--service-account=<service-account-email>
|
|
```
|
|
|
|
We give the GCE instances inside the cluster the same credential as the proxy
|
|
service account, which makes it easier to limit permissions granted to service
|
|
accounts. If we use the default GCE service account, we'd have to grant the
|
|
default GCE service account permission to read from GCR in order to download
|
|
images of the proxy to create pods, which gives *any* GCE instance with the
|
|
default service account that permission.
|
|
|
|
Note the `--tags` flag: it will apply the tag to all GCE instances running in
|
|
the cluster, making it easier to set up firewall rules later on. Use the same
|
|
tag for all clusters.
|
|
|
|
Repeat this for all the zones you want to create clusters in.
|
|
|
|
### Upload proxy docker image to GCR
|
|
|
|
The GKE deployment manifest is set up to pull the proxy docker image from
|
|
[Google Container Registry](https://cloud.google.com/container-registry/) (GCR).
|
|
Instead of using `docker` and `gcloud` to build and push images, respectively,
|
|
we provide `bazel` rules for the same tasks. To push an image, first use
|
|
[`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr)
|
|
to obtain necessary credentials. It is used by the [bazel container_push
|
|
rules](https://github.com/bazelbuild/rules_docker#authentication) to push the
|
|
image.
|
|
|
|
After credentials are configured, edit the `proxy_push` rule in
|
|
`java/google/registry/proxy/BUILD` to add the GCP project name and the image
|
|
name to save to. We recommend using the same project and image for proxies
|
|
intended for different Nomulus environments, this way one can deploy the same
|
|
proxy image first to sandbox for testing, and then to production.
|
|
|
|
Also note that as currently set up, all images pushed to GCR will be tagged
|
|
`bazel` and the GKE deployment object loads the image tagged as `bazel`. This is
|
|
fine for testing, but for production one should give images unique tags (also
|
|
configured in the `proxy_push` rule).
|
|
|
|
To push to GCR, run:
|
|
|
|
```bash
|
|
$ bazel run java/google/registry/proxy:proxy_push
|
|
```
|
|
|
|
If the GCP project to host images (gcr project) is different from the project
|
|
that the proxy runs in (proxy project), give the service account "Storage Object
|
|
Viewer" role of the gcr project.
|
|
|
|
```bash
|
|
$ gcloud projects add-iam-policy-binding <image-project> \
|
|
--member serviceAccount:<service-account-email> \
|
|
--role roles/storage.objectViewer
|
|
```
|
|
|
|
### Upload proxy service account key to GKE cluster
|
|
|
|
The kubernetes pods (containers) are configured to read the proxy service
|
|
account key file from a secret resource stored in the cluster.
|
|
|
|
First set the cluster credential in `gcloud` so that `kubectl` knows which
|
|
cluster to manage:
|
|
|
|
```bash
|
|
$ gcloud container clusters get-credentials proxy-americas-cluster \
|
|
--zone us-east1-c
|
|
```
|
|
|
|
To upload the key file as `service-account-key.json` as a secret named
|
|
`service-account`:
|
|
|
|
```bash
|
|
$ kubectl create secret generic service-account \
|
|
--from-file=service-account-key.json=<service-account-key.json>
|
|
```
|
|
|
|
More details on using service account on GKE can be found
|
|
[here](https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform).
|
|
|
|
Repeat the same step for all clusters you want to deploy to. Use `gcloud` to
|
|
switch context, and then `kubectl` to upload the key.
|
|
|
|
### Deploy proxy to GKE clusters
|
|
|
|
Use `kubectl` to create the deployment and autoscale objects:
|
|
|
|
```bash
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-deployment-alpha.yaml
|
|
```
|
|
|
|
The kubernetes
|
|
[deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
|
|
object specifies the images to run, along with its parameters. The
|
|
[autoscale](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
|
|
object changes the number of pods running based on CPU load. This is different
|
|
from GKE cluster autoscaling, which changes the number of nodes (VMs) running
|
|
based on pod resource requests. Ideally if there's no load, just one pod will be
|
|
running in one cluster, resulting only one node running as well, saving
|
|
resources.
|
|
|
|
Repeat the same step for all clusters you want to deploy to.
|
|
|
|
### Expose the proxy service
|
|
|
|
The proxies running on GKE clusters need to be exposed to the outside. Do not
|
|
use Kubernetes
|
|
[`LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#type-loadbalancer).
|
|
It will create a GCP [Network Load
|
|
Balancer](https://cloud.google.com/compute/docs/load-balancing/network/), which
|
|
has several problems:
|
|
|
|
- This load balancer does not terminate TCP connections. It simply acts as an
|
|
edge router that forwards IP packets to a "healthy" node in the cluster. As
|
|
such, it does not support IPv6, because GCE instances themselves are
|
|
currently IPv4 only.
|
|
- IP packets that arrived on the node may be routed to another node for
|
|
reasons of capacity and availability. In doing so it will
|
|
[SNAT](https://en.wikipedia.org/wiki/Network_address_translation#SNAT) the
|
|
packet, therefore losing the source IP information that the proxy needs. The
|
|
proxy uses WHOIS source IP address to cap QPS and passes EPP source IP to
|
|
Nomulus for validation. Note that a TCP terminating load balancer also has
|
|
this problem as the source IP becomes that of the load balancer, but it can
|
|
be addressed in other ways (explained later). See
|
|
[here](https://kubernetes.io/docs/tutorials/services/source-ip/) for more
|
|
details on how Kubernetes route traffic and translate source IPs inside the
|
|
cluster.
|
|
- Acting as an edge router, this type of load balancer can only work with a
|
|
given region as each GCP region forms its own subnet. Therefore multiple
|
|
load balancers, and IP addresses are needed if the proxy were to run in
|
|
multiple regional clusters.
|
|
|
|
Instead, we split the task of exposing the proxy to the Internet into two tasks,
|
|
first to expose it within the cluster, then to expose the cluster to the outside
|
|
through a [TCP Proxy Load
|
|
Balancer](https://cloud.google.com/compute/docs/load-balancing/tcp-ssl/tcp-proxy).
|
|
This load balancer terminates TCP connections and allows for the use of a single
|
|
anycast IP address (IPv4 and IPv6) to reach any clusters connected to its
|
|
backend (it chooses a particular cluster based on geographical proximity). From
|
|
this point forward we will refer to this type of load balancer simply as the
|
|
load balancer.
|
|
|
|
#### Set up proxy NodePort service
|
|
|
|
Kubernetes pods and nodes are
|
|
[ephemeral](https://kubernetes.io/docs/concepts/services-networking/service/). A
|
|
pod may crash and be killed, and a new pod will be spun up by the master node to
|
|
fill its role. Similarly a node may be shut down due to under-utilization
|
|
(thanks to GKE autoscaling). In order to reliably route incoming traffic to the
|
|
proxy, a
|
|
[NodePort](https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport)
|
|
service is used to expose the proxy on specificed port(s) on every running node
|
|
in the cluster, even if the proxy does not run on a VM (in which case the
|
|
traffic is routed to a VM that has the proxy running). With a [NodePort]
|
|
service, the load balancer can alway route traffic to any healthy node, and
|
|
kubernetes takes care of delivering that traffic to a servicing proxy pod.
|
|
|
|
To deploy the NodePort service:
|
|
|
|
```bash
|
|
$ kubectl create -f \
|
|
proxy/kubernetes/proxy-service.yaml
|
|
```
|
|
|
|
This service object will open up port 30000 (health check), 30001 (WHOIS) and
|
|
30002 (EPP) on the nodes, routing to the same ports inside a pod.
|
|
|
|
Repeat this for all clusters.
|
|
|
|
#### Map named ports in GCE instance groups
|
|
|
|
GKE uses GCE as its underlying infrastructure. A GKE cluster (or more precisely,
|
|
a node pool) corresponds to a GKE instance group. In order to receive traffic
|
|
from a load balancer backend, an instance group needs to designate the ports
|
|
that are to receive traffic, by giving them names (i. e. making them "named
|
|
ports").
|
|
|
|
As mentioned above, the Kubernetes `NodePort` service object sets up three ports
|
|
to receive traffic (30000, 30001 and 30002). Port 30000 is used by the health
|
|
check protocol (discussed later) and does not need to be explicitly named.
|
|
|
|
First obtain the instance group names for the clusters:
|
|
|
|
```bash
|
|
$ gcloud compute instance-groups list
|
|
```
|
|
|
|
They start with `gke` and have the cluster names in them, should be easy to
|
|
spot.
|
|
|
|
Then set the named ports:
|
|
|
|
```bash
|
|
$ gcloud compute instance-groups set-named-ports <instance-group> \
|
|
--named-ports whois:30001,epp:30002 --zone <zone>
|
|
```
|
|
|
|
Repeat this for each instance group (cluster).
|
|
|
|
#### Set up firewall rules to allow traffic from the load balancer
|
|
|
|
By default inbound traffic from the load balancer are dropped by the GCE
|
|
firewall. A new firewall rule needs to be added to explicitly allow TCP packets
|
|
originating from the load balancer to the three ports opened in the `NodePort`
|
|
service on the nodes.
|
|
|
|
```bash
|
|
$ gcloud compute firewall-rules create proxy-loadbalancer \
|
|
--source-ranges 130.211.0.0/22,35.191.0.0/16 \
|
|
--target-tags proxy-cluster \
|
|
--allow tcp:30000,tcp:30001,tcp:30002
|
|
```
|
|
|
|
The target tag controls what GCE VMs can receive traffic allowed in this rule.
|
|
It is the same tag used during cluster creation. Since we use the same tag for
|
|
all clusters, this rule applies to all VMs running the proxy. The load balancer
|
|
source IP is taken from
|
|
[here](https://cloud.google.com/compute/docs/load-balancing/tcp-ssl/tcp-proxy#config-hc-firewall)
|
|
|
|
#### Create health check
|
|
|
|
The load balancer sends TCP requests to a designated port on each backend VM to
|
|
probe if the VM is healthy to serve traffic. The proxy by default uses port
|
|
30000 (which is exposed as the same port on the node) for health check and
|
|
returns a pre-configured response (`HEALTH_CHECK_RESPONSE`) when an expected
|
|
request (`HEALTH_CHECK_REQUEST`) is received. To add health check:
|
|
|
|
```bash
|
|
$ gcloud compute health-checks create tcp proxy-health \
|
|
--description "Health check on port 30000 for Nomulus proxy" \
|
|
--port 30000 --request "HEALTH_CHECK_REQUEST" --response "HEALTH_CHECK_RESPONSE"
|
|
```
|
|
|
|
#### Create load balancer backend
|
|
|
|
The load balancer backend configures what instance groups the load balancer
|
|
sends packets to. We have already setup `NodePort` service on each node in all
|
|
the clusters to ensure that traffic to any of the exposed node ports will be
|
|
routed to the corresponding port on a proxy pod. The backend service codifies
|
|
which ports on the node's clusters should receive traffic from the load
|
|
balancer.
|
|
|
|
Create one backend service for EPP and one for WHOIS:
|
|
|
|
```bash
|
|
# EPP backend
|
|
$ gcloud compute backend-services create proxy-epp-loadbalancer \
|
|
--global --protocol TCP --health-checks proxy-health --timeout 1h \
|
|
--port-name epp
|
|
|
|
# WHOIS backend
|
|
$ gcloud compute backend-services create proxy-whois-loadbalancer \
|
|
--global --protocol TCP --health-checks proxy-health --timeout 1h \
|
|
--port-name whois
|
|
```
|
|
|
|
These two backend services route packets to the epp named port and whois named
|
|
port on any instance group attached to them, respectively.
|
|
|
|
Then add (attach) instance groups that the proxies run on to each backend
|
|
service:
|
|
|
|
```bash
|
|
# EPP backend
|
|
$ gcloud compute backend-services add-backend proxy-epp-loadbalancer \
|
|
--global --instance-group <instance-group> --instance-group-zone <zone> \
|
|
--balancing-mode UTILIZATION --max-utilization 0.8
|
|
|
|
# WHOIS backend
|
|
$ gcloud compute backend-services add-backend proxy-whois-loadbalancer \
|
|
--global --instance-group <instance-group> --instance-group-zone <zone> \
|
|
--balancing-mode UTILIZATION --max-utilization 0.8
|
|
```
|
|
|
|
Repeat this for each instance group.
|
|
|
|
#### Reserve static IP addresses for the load balancer frontend
|
|
|
|
These are the public IP addresses that receive all outside traffic. We need one
|
|
address for IPv4 and one for IPv6:
|
|
|
|
```bash
|
|
# IPv4
|
|
$ gcloud compute addresses create proxy-ipv4 \
|
|
--description "Global static anycast IPv4 address for Nomulus proxy" \
|
|
--ip-version IPV4 --global
|
|
|
|
# IPv6
|
|
$ gcloud compute addresses create proxy-ipv6 \
|
|
--description "Global static anycast IPv6 address for Nomulus proxy" \
|
|
--ip-version IPV6 --global
|
|
```
|
|
|
|
To check the IP addresses obtained:
|
|
|
|
```bash
|
|
$ gcloud compute addresses describe proxy-ipv4 --global
|
|
$ gcloud compute addresses describe proxy-ipv6 --global
|
|
```
|
|
|
|
Set these IP addresses as the A/AAAA records for both epp.<nic.tld> and
|
|
whois.<nic.tld> where <nic.tld> is the domain that was obtained earlier. (If you
|
|
use [Cloud DNS](https://cloud.google.com/dns/) as your DNS provider, this step
|
|
can also be performed by `gcloud`)
|
|
|
|
#### Create load balancer frontend
|
|
|
|
The frontend receives traffic from the Internet and routes it to the backend
|
|
service.
|
|
|
|
First create a TCP proxy (yes, it is confusing, this GCP resource is called
|
|
"proxy" as well) which is a TCP termination point. Outside connections terminate
|
|
on a TCP proxy, which establishes its own connection to the backend services
|
|
defined above. As such, the source IP address from the outside is lost. But the
|
|
TCP proxy can add the [PROXY protocol
|
|
header](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) at the
|
|
beginning of the connection to the backend. The proxy running on the backend can
|
|
parse the header and obtain the original source IP address of a request.
|
|
|
|
Make one for each protocol (EPP and WHOIS).
|
|
|
|
```bash
|
|
# EPP
|
|
$ gcloud compute target-tcp-proxies create proxy-epp-proxy \
|
|
--backend-service proxy-epp-loadbalancer --proxy-header PROXY_V1
|
|
|
|
# WHOIS
|
|
$ gcloud compute target-tcp-proxies create proxy-whois-proxy \
|
|
--backend-service proxy-whois-loadbalancer --proxy-header PROXY_V1
|
|
```
|
|
|
|
Note the use of the `--proxy-header` flag, which turns on the PROXY protocol
|
|
header.
|
|
|
|
Next, create the forwarding rule that route outside traffic to a given IP to the
|
|
TCP proxy just created:
|
|
|
|
```bash
|
|
$ gcloud compute forwarding-rules create proxy-whois-ipv4 \
|
|
--global --target-tcp-proxy proxy-whois-proxy \
|
|
--address proxy-ipv4 --ports 43
|
|
```
|
|
|
|
The above command sets up a forwarding rule that routes traffic destined to the
|
|
static IPv4 address reserved earlier, on port 43 (actual port for WHOIS), to the
|
|
TCP proxy that connects to the whois backend service.
|
|
|
|
Repeat the above command another three times, set up IPv6 forwarding for WHOIS,
|
|
and IPv4/IPv6 forwarding for EPP.
|
|
|
|
## Additional steps
|
|
|
|
### Check if it all works
|
|
|
|
At this point the proxy should be working and reachable from the Internet. Try
|
|
if a whois request to it is successful:
|
|
|
|
```bash
|
|
whois -h whois.<nic.tld> something
|
|
```
|
|
|
|
One can also try to contact the EPP endpoint with an EPP client.
|
|
|
|
### Check logs and metrics
|
|
|
|
The proxy saves logs to [Stackdriver
|
|
Logging](https://cloud.google.com/logging/), which is the same place that
|
|
Nomulus saves it logs to. On GCP console, navigate to Logging - Logs - GKE
|
|
Container - <cluster name> - default. Do not choose "All namespace_id" as it
|
|
includes logs from the Kubernetes system itself and can be quite overwhelming.
|
|
|
|
Metrics are stored in [Stackdriver
|
|
Monitoring](https://cloud.google.com/monitoring/docs/). To view the metrics, go
|
|
to Stackdriver [console](https://app.google.stackdriver.com) (also accessible
|
|
from GCE console under Monitoring), navigate to Resources - Metrics Explorer.
|
|
Choose resource type "GKE Container" and search for metrics with name "/proxy/"
|
|
in it. Currently available metrics include total connection counts, active
|
|
connection count, request/response count, request/response size, round-trip
|
|
latency and quota rejection count.
|
|
|
|
### Cleanup sensitive files
|
|
|
|
Delete the service account key file and the SSL certificate private key, or
|
|
store them in some secure location.
|