This PR changes the two flavors of OIDC authentication mechanisms to
verify the same audience. This allows the same token to pass both
mechanisms. Previously the regular OIDC flavor uses the project id as
its required audience, which does not work for local user credentials
(such as ones used by the nomulus tool), which requires a valid OAuth
client ID as audience when minting the token (project id is NOT a valid
OAuth client ID).
I considered allowing multiple audiences, but the result is not as clean
as just using the same everywhere, because the fall-through logic would
have generated a lot of noises for failed attempts.
This PR also changes the client side to solely use OIDC token whenever
possible, including the proxy, cloud scheduler and cloud tasks. The nomulus
tool still uses OAuth access token by default because it requires USER level
authentication, which in turn requires us to fill the User table with objects
corresponding to the email address of everyone needing access to the tool.
TESTED=verified each client is able to make authenticated calls on QA with or
without IAP.
* Remove dos.xml from the configs
We don't have dos config right now, and applying dos from "gcloud app
deploy" is deprecated and has started causing problems.
If we add dos configs, it should be using "gcloud app firewall-rules".
* Use SecretManager for nomulus-tool-cloudbuild cred
Store cloudbuild's nomulus-tool credential in SecretManager and make the
deployment pipeline load it from the SecretManager.
The tool-credential.json.enc file in the
gs://domain-registry-dev-deploy/secrets folder is no longer needed.
There has been a case where the CI was broken on Friday and no one
noticied or fixed it and a RC build was built with broken tests.
The tests were disabled due to unknown test failures that have since
been fixed.
Also update the machine type used by GCB to be more powerful. This is
necessary for the tests to past because N1_HIGHCPU_8 is RAM constraint
and the tests crashes. I updated all jobs to use the new type which
hopefully will make the build faster as well.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1130)
<!-- Reviewable:end -->
This is similar to the migration of the spec11 pipeline in #1073. Also removed
a few Dagger providers that are no longer needed.
TESTED=tested the dataflow job on alpha.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1100)
<!-- Reviewable:end -->
* Migrate Spec11 pipeline to flex template
Unfortunately this PR has turned out to be much bigger than I initially
conceived. However this is no good way to separate it out because the
changes are intertwined. This PR includes 3 main changes:
1. Change the spec11 pipline to use Dataflow Flex Template.
2. Retire the use of the old JPA layer that relies on credential saved
in KMS.
3. Some extensive refactoring to streamline the logic and improve test
isolation.
* Fix job name and remove projectId from options
* Add parameter logs
* Set RegistryEnvironment
* Remove logging and modify safe browsing API key regex
* Rename a test method and rebase
* Remove unused Junit extension
* Specify job region
* Maintain a release-to-Version map in deployment
Keep track of the mapping between Nomulus release tags and AppEngine
version ids with a mapping file. This is necessary because AppEngine
does not support custom versioning. With this mapping, rollbacks could
be automated. Automation of rollbacks is important since there are
test-supporting metadata to be updated, but are easily forgotten.
During the last stage of deployment, current per-service version ids
are fetched using gcloud and are appended to a file on GCS. Each line
is of the format "{RELEASE_TAG},{APPENGINE_SERVICE},{APPENGINE_VERSION}.
This change has been tested in crash. The rollback script is still a
work in progress.
The current setup causes the GCB job to fail validation and not run because it
uses backticks in the configuration yaml, which is not allowed -- there is no
shell to perform backtick substitution. See the error message here:
https://spinnaker.endpoints.domain-registry-dev.cloud.goog/gate/pipelines/01EF5GRMD625613H6Z033DBD3Z
In the future please make sure to test the GCB pipeline as instructed in
the comments at the beginning of each file before committing.
I tried to work around it by downloading the nomulus tool jar file
instead (running the nomulus-tool docker image inside a docker image is
not advisable). However the "nomulus deploy_spec11_pipeline" command
still fails. I'm not sure why. Has the command itself been tested
locally? The error message is shown below:
```
Step #2: Aug 09, 2020 3:11:46 AM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
Step #2: WARNING: --region not set; will default to us-central1. Future releases of Beam will require the user to set the region explicitly. https://cloud.google.com/compute/docs/regions-zones/regions-zones
Step #2: Aug 09, 2020 3:11:46 AM org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory tryCreateDefaultBucket
Step #2: INFO: No tempLocation specified, attempting to use default bucket: dataflow-staging-us-central1-937378958468
Step #2: Aug 09, 2020 3:11:47 AM org.apache.beam.sdk.extensions.gcp.util.RetryHttpRequestInitializer$LoggingHttpBackOffHandler handleResponse
Step #2: WARNING: Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://www.googleapis.com/storage/v1/b?predefinedAcl=projectPrivate&predefinedDefaultObjectAcl=projectPrivate&project=domain-registry-alpha.
Step #2: Exception in thread "main"
Step #2: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
Step #2: at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:224)
Step #2: at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:155)
Step #2:
Step #2: at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
Step #2: at org.apache.beam.sdk.Pipeline.create(Pipeline.java:147)
Step #2:
Step #2: at google.registry.beam.spec11.Spec11Pipeline.deploy(Spec11Pipeline.java:157)
Step #2: at google.registry.tools.DeploySpec11PipelineCommand.run(DeploySpec11PipelineCommand.java:80)
Step #2: at google.registry.tools.RegistryCli.runCommand(RegistryCli.java:257)
Step #2: at google.registry.tools.RegistryCli.run(RegistryCli.java:182)
Step #2: at google.registry.tools.RegistryTool.main(RegistryTool.java:129)
Step #2: Caused by: java.lang.reflect.InvocationTargetException
Step #2: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Step #2:
Step #2: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Step #2: at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Step #2: at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Step #2: at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
Step #2: ... 8 more
Step #2: Caused by: java.lang.IllegalArgumentException: Unable to use ClassLoader to detect classpath elements. Current ClassLoader is jdk.internal.loader.ClassLoaders$AppClassLoader@5cb0d902, only URLClassLoaders are supported.
Step #2: at org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage(PipelineResources.java:58)
Step #2:
Step #2: at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:285)
Step #2:
Step #2: ... 13 more
```
Lastly the "--project" flag refers to the KMS project. While I'm not
sure which project is that, I don't think we can use the PROJECT_ID
variable as this is a GCB-substituted variable which refers to the
project that the GCB job runs in, which in our cases means
domain-registry-dev. We shouldn't use that project for KMS. I've changed
it to the same project as the one we are deploying to, but please note
that we have a separate project ${project_id}-keys that is used for all
KMS purposes. This is specified in the config file so if that's what you
meant to use, there is no need to specify it in the command line. Actually
if you meant use the project to be deployed to for KMS, it also
shouldn't be necessary to specify it separately as this information is
already known when you specified "nomulus -e ENV".
https://team.git.corp.google.com/domain-registry-eng/nomulus-internal/+/refs/heads/master/core/src/main/java/google/registry/config/files/nomulus-config-production.yaml#168
Can you add more description on what the KMS project is supposed to be?
I don't think we specify a project for KMS purpose in any other
commands.
Given that there are several unresolved issues, I've commented out my
proposed solution so that deployment can proceed.
* Replace jpaTm with a JpaSupplierFactory
* Style
* Style
* Pipeline takes in a SerializableSupplier instead
* Change the ordering of imports
* Test a good domain in addition to a bad one
* Rename and check good domain for Transact Answer
* Use standard Mockito verify
* Verify transact call and no more interactions
* Remove Answer comment
* Naming chsnges
* Deploy Spec 11 pipeline correctly
* Fix formatting of deploy file
* Use a file to persist state across Cloud Build steps
Co-authored-by: Gus Brodman <gbrodman@google.com>
* Work around Spinnaker issue wrt variables
Cloud Build variable reference need to stay from the ${var} pattern
to prevent Spinnaker from trying to resolve it. In all files that
are used by Spinnaker, we change variable reference to the $var form.
We made the minimum amount of change possible, and will review this
issue after the permanent solution is available.
Defined Docker image for schema deployment.
Included schema deploymer docker in the Cloud Build release process.
Defined Cloud Build config for schema deployment.
TESTED=Used cloud-build-local to test deployment flow
TESTED=Used docker to test schema deployer image in more ways
* Save release tag during deployment
* Save current tag for every environment
Store tag of the current deployment in each environment.
This is used by the server-sql compatibility test.
* Save current tag for every environment
Store tag of the current deployment in each environment.
This is used by the server-sql compatibility test.
* Merge beam and GAE configs deployment to one GCB job
Deployment of GAE configs requires that the credential used by gcloud to
have GAE admin role of the project to be managed. We do not want to
grant the GCB service account that role, because it would all *any* GCB
job to deploy anything to GAE. Instead we use a dedicated credential
originally created to deploy beam pipelines. This credential is
encrypted by KMS and stored on GCS. Since the beam pipeline deployment
GCB job already does the decryption, it make sense to add the config
deployment step there as well. The beam deployment steps are tweaked to
use the nomulus tool docker image instead of the jar file.
Also moved the content of deploy_configs_to_env.sh to the GCB yaml file
itself because the shell script is not uploaded to GC Bat the same time
as the yaml file when the job is triggered by Spinnaker.
Lastly, due to b/137891685, using GCB to deploy cron jobs does not work
as we cannot use service account credential to deploy to projects under
google.com.