Commit graph

113 commits

Author SHA1 Message Date
Pavlo Tkach
572b7101cb
Create separate BSA service (#2221) 2023-11-15 18:38:26 -05:00
Weimin Yu
7ab76f3573
Pin Flyway tool jar to 9.22.3 (#2222)
Flyway 10+ is not compatible with Java 8.

Rollback this change after we upgrade to Java 11.
2023-11-15 14:48:55 -05:00
Pavlo Tkach
e647d4e215
Add retry to cloud build node installation (#2210) 2023-11-06 09:15:36 -05:00
Pavlo Tkach
87e99f59bc
Replace node.js installation method in build.sh (#2206) 2023-11-02 14:17:18 -04:00
sarahcaseybot
bf3bb5d804
Add a Cloud Build job for syncing Tld configuration files from the internal repo with the database (#2174)
* Add a cloudbuild-tld-sync job

This job checks the Tld config files in the internal repo and syncs them with the actual Tld objects in the database using the configure_tld numulus command.

* Add the dockerfile and shell script

* Force the command

* Add comments

* add newline

* Create a separate copy of the job for each environment

* fix file name

* Fix indentation
2023-10-19 14:01:40 -04:00
Weimin Yu
f54bec7553
Add docs for Cloud Build status notification (#2157)
Add documentation that describes the current Cloud Build status notification
to Google Chat, as well as how to update the configuration and the
notifier service.
2023-09-29 10:49:15 -04:00
Lai Jiang
6047c16f3e
Make Kythe work with Gradle 8 (#2141)
Mostly implementing the fix suggested by b/294850265. Tested by
submitting a job to GCB which ran successfully.
2023-09-12 10:47:57 -04:00
sarahcaseybot
655f05c58c
Remove references to cloud-build-local (#2111)
* Update cloudbuild-nomulus to save standardTest logs to GCS

* Remove step changes from cloudbuild-nomulus
2023-08-17 15:26:41 -04:00
Pavlo Tkach
95c810ddf4
Add script to allow quickly update number of instances (#2112)
This is a fast and easy way to update number of instances for the service deployed to app engine. Works with manual-scaling types services.
2023-08-17 12:33:35 -04:00
Lai Jiang
fdfbb9572d
Refactor OIDC-based auth mechanism (#2049)
This PR changes the two flavors of OIDC authentication mechanisms to
verify the same audience. This allows the same token to pass both
mechanisms. Previously the regular OIDC flavor uses the project id as
its required audience, which does not work for local user credentials
(such as ones used by the nomulus tool), which requires a valid OAuth
client ID as audience when minting the token (project id is NOT a valid
OAuth client ID).

I considered allowing multiple audiences, but the result is not as clean
as just using the same everywhere, because the fall-through logic would
have generated a lot of noises for failed attempts.

This PR also changes the client side to solely use OIDC token whenever
possible, including the proxy, cloud scheduler and cloud tasks. The nomulus
tool still uses OAuth access token by default because it requires USER level
authentication, which in turn requires us to fill the User table with objects
corresponding to the email address of everyone needing access to the tool.

TESTED=verified each client is able to make authenticated calls on QA with or
without IAP.
2023-06-27 13:10:31 -04:00
Pavlo Tkach
55243e7cf6
Adds cloud scheduler and tasks deployer (#1999) 2023-05-04 15:57:32 -04:00
Weimin Yu
61ab29ae9e
Prober ssl cert update automation (#2019)
Defined CloudBuild script and docker image that automatically
updates probers' SSL certs
2023-05-03 15:57:50 -04:00
gbrodman
ff8a08f40e
Fix typo in pipeline name (#2016) 2023-04-28 17:05:24 -04:00
gbrodman
a341058282
Refactor / rename Billing object classes (#1993)
This includes renaming the billing classes to match the SQL table names,
as well as splitting them out into their own separate top-level classes.
The rest of the changes are mostly renaming variables and comments etc.

We now use `BillingBase` as the name of the common billing superclass,
because one-time events are called BillingEvents
2023-04-28 14:27:37 -04:00
Lai Jiang
bd0cea0d87
Remove AppEngineServiceUtils (#2003)
The only method that is called from this class is setNumInstances. However we
don't current use `nomulus set_num_instances` anywhere. If we need to change
the number of instances, it is either done by updating appengine-web.xml, which
is deployed by Spinnaker, or doing it manually as a break-glass fix via gcloud
or on Pantheon.
2023-04-21 10:11:12 -04:00
Lai Jiang
5ec73f3809
Refactor contact history PII wipeout logic into a Beam pipeline (#1994)
Because we need to check if a contact history is the most recent for its
underlying contact resource, the query-wipe out-repeat loop no longer works
ideally due to the added overhead with the query.

Instead, we refactor the logic into a Beam pipeline where the query only
needs to be performed once and history entries eligible for wipe out are
handled individually in their own transforms. Because history entries
are otherwise immutable, we can run the pipeline in relatively relaxed
repeatable read isolation level. We also do not worry about batching for
performance, as we do not anticipate this operation to put a lot of
strains on the particular table.
2023-04-19 13:04:45 -04:00
Pavlo Tkach
055a52f67e
Trim cloud scheduler config url value before submitting (#1988) 2023-04-10 19:05:32 -04:00
Pavlo Tkach
425ecdcd87
Add disable_runner_v2 to pipeline options (#1976) 2023-03-30 17:10:37 -04:00
Lai Jiang
b9742adc0b
Delete cron.xml (#1965)
We've successfully migrated to using Cloud Scheduler.
2023-03-23 14:29:06 -04:00
Pavlo Tkach
3108e8a871
Use builder image as a base for schema-deployer and schema-verifier (#1955) 2023-03-13 15:37:02 -04:00
Pavlo Tkach
71a8579ece
Move App Engine cron jobs to cloud scheduler (#1939) 2023-03-01 13:40:56 -05:00
Lai Jiang
ef3ce79b8a
Install procps in schema-deployer image (#1934)
It turns out this one uses pgrep and pkill as well, go figure...

<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1934)
<!-- Reviewable:end -->
2023-02-08 09:59:47 -05:00
Lai Jiang
a53b71ecd5
Install procps (#1932)
The schema verifier script needs pgrep and pkill, which do not come with
Debian.
2023-02-06 19:45:04 -05:00
Lai Jiang
fc9446876f
Install curl (#1931)
Tested by running "docker build .".
2023-02-06 16:45:52 -05:00
Lai Jiang
14d68d4cb2
Change base image for schema-verifier and schema-deployer (#1930)
Ubuntu 18.04 is entering EOL and the Cloud Build jobs are failing,
seemingly due to connection error to 18.04 repos:

https://pantheon.corp.google.com/cloud-build/builds;region=global/126a7c90-4322-41f1-ba1c-a10e38a32dab;step=5?project=domain-registry-dev

We use Debian 10 for the main builder, so it's better to keep everything
on the same schedule:

https://cs.opensource.google/nomulus/nomulus/+/master:release/builder/Dockerfile

Debian 10 is supported till June 2024:

https://wiki.debian.org/LTS
2023-02-06 13:09:37 -05:00
Lai Jiang
ac14688a4f
Do not deploy datastore index file (#1913)
The index was deleted in #1905.

<!-- Reviewable:start -->
- - -
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1913)
<!-- Reviewable:end -->
2023-01-18 16:31:35 -05:00
Lai Jiang
9dab1e86ec
Add a beam pipeline to expand recurring billing event (#1881)
This will replace the ExpandRecurringBillingEventsAction, which has a
couple of issues:

1) The action starts with too many Recurrings that are later filtered out
   because their expanded OneTimes are not actually in scope. This is due
   to the Recurrings not recording its latest expanded event time, and
   therefore many Recurrings that are not yet due for renewal get included
   in the initial query.

2) The action works in sequence, which exacerbated the issue in 1) and
   makes it very slow to run if the window of operation is wider than
   one day, which in turn makes it impossible to run any catch-up
   expansions with any significant gap to fill.

3) The action only expands the recurrence when the billing times because
   due, but most of its logic works on event time, which is 45 days
   before billing time, making the code hard to reason about and
   error-prone.  This has led to b/258822640 where a premature
   optimization intended to fix 1) caused some autorenwals to not be
   expanded correctly when subsequent manual renews within the autorenew
   grace period closed the original recurrece.

As a result, the new pipeline addresses the above issues in the
following way:

1) Update the recurrenceLastExpansion field on the Recurring when a new
   expansion occurs, and narrow down the Recurrings in scope for
   expansion by only looking for the ones that have not been expanded for
   more than a year.

2) Make it a Beam pipeline so expansions can happen in parallel. The
   Recurrings are grouped into batches in order to not overwhelm the
   database with writes for each expansion.

3) Create new expansions when the event time, as opposed to billing
   time, is within the operation window. This streamlines the logic and
   makes it clearer and easier to reason about. This also aligns with
   how other (cancelllable) operations for which there are accompanying
   grace periods are handled, when the corresponding data is always
   speculatively created at event time. Lastly, doing this negates the
   need to check if the expansion has finished running before generating
   the monthly invoices, because the billing events are now created not
   just-in-time, but 45 days in advance.

Note that this PR only adds the pipeline. It does not switch the default
behavior to using the pipeline, which is still done by
ExpandRecurringBillingEventsAction. We will first use this pipeline to
generate missing billing events and domain histories caused by
b/258822640. This also allows us to test it in production, as it
backfills data that will not affect ongoing invoice generation. If
anything goes wrong, we can always delete the generated billing events
and domain histories, based on the unique "reason" in them.

This pipeline can only run after we switch to use SQL sequence based ID
allocation, introduced in #1831.
2023-01-09 17:41:56 -05:00
Lai Jiang
827b7db227
Make Kythe run work with Gradle 7 (#1727)
The fix is based on b/240627423. I tested locally and was able to build
with the -PenableCrossReferencing=true flag successfully.

TESTED=run the kythe GCB pipeline locally.
2022-08-02 13:19:47 -04:00
gbrodman
72abc824d5
Delete DatastoreTM and most other references to Datastore (#1681)
This includes:
- deletion of helper DB methods in tests
- deletion of various old Datastore-only classes and removal of any
  endpoints
- removal of the dual-database test concept
- removal of 'ofy' from the AppEngineExtension
2022-07-01 13:33:38 -04:00
Lai Jiang
62236f7581
Only use GPG2 in tests (#1676)
GPG1 is deprecated and stuck in v1.4 from 2018. GPG2 is recommended. We
only use the GPG binary in tests and when the host system has both
versions it causes problems because we hardcode the GPG import command
in GpgSystemCommandExension to use the binary named "gpg", which could
be linked to either GPG1 or GPG2, causing the other test to fail when
the version of GPG that runs in tests is incompatible with the version of GPG
that imports the keys.

With this PR we only support GPG2 from now on.
2022-06-22 11:03:41 -04:00
gbrodman
2f8be045c7
Delete code relating to SQL init and scheduling (#1661)
One of the more significant changes introduced in this PR is that we use
SQL as the backing database in all tests unless otherwise specified,
e.g. by using the TmOverrideExtension. We change various ofy-related
tests to use this.

This includes various changes:
- Deletion of SqlEntity/DatastoreEntity and related classes. Includes
  any necessary changes because of that (e.g. getting a nice SQL key on
  error in RegistryJpaIO).
- Deletion of classes that used libraries from the init-sql code
  (RefreshDnsOnHostRenameAction)
- Removal of the JpaTransactionManager's backup implementation
- Modification of RegistryJpaWriteTest to not use init-sql code
- Removal of the Transaction class and related classes, however it does
  not remove the TransactionEntity class as that would require DB
  changes
- Removal of anything related to the actual usage of the database
  migration schedule or read-only phases
- Various test changes and fixes to account for the differences in SQL
  (like how foreign keys need to exist)

This deliberately doesn't do anything to alter the objects actually
stored in the DB yet, just how we use them
2022-06-13 15:10:35 -04:00
gbrodman
623356b1e8
Remove functional SQL<->DS replay code (#1659)
This includes:
- removing the actions that do the replay
- removing the tests for the replay
- removing the ReplayExtension and adjusting the various tests that used
  it appropriately
- removing functionality relating to "things that happen during replay",
  e.g. beforeSqlSaveOnReplay

This does not include:
- removing the InitSqlPipeline or similar tasks
- removing e.g. SqlEntity (it's used in other places)
- removing Transforms/RegistryJpaIO and other SQL-pipeline-creation code
2022-06-09 07:44:01 -04:00
Weimin Yu
4f69e1e0a6
Remove bracket in Cloud Build script (#1658)
* Remove bracket around varname in CloudBuild script

Due to spinnaker restriction: it cannot handle variable references where the var name has brackets around it.

Added spinnaker error message to the comments
2022-06-08 13:58:56 -04:00
Weimin Yu
bee5e0a5a9
Verify schema using Cloud Build (#1627)
* Add tool to compare  golden and actual schema
2022-05-16 16:10:09 -04:00
Weimin Yu
d361f7cf18
Tag nomulus-tool in schema deployment script (#1621)
* Tag nomulus-tool in schema deployment script
2022-05-05 12:46:32 -04:00
Lai Jiang
f4436b54cf
Do not delete build cache when building release candidates (#1619)
We would like to re-use the build cache when building RCs for different
environments. There's not much practical use in doing a "clean" for
every build when Gradle should be able to figure out which artifacts
need to be rebuilt. It also does not make sense to build each
environment in a separate step, which also introduces redunency because
not all artifacts are cached across steps. The build cache is enabled by
default.

Lastly, the cache needs to be inside the /workspace folder, which is the
default persisted storage location.

TESTED=tried to build the RCs on alpha and saved about 10 min.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1619)
<!-- Reviewable:end -->
2022-05-04 12:08:10 -04:00
Lai Jiang
4ec8b71f42
Increase Nomulus build timeout (#1613)
We have recently started to routinely breach the 1h timeout. Increasing
this value to 2h. We should also look into reusing the artifacts when
building RCs for different environments.
2022-05-02 16:11:11 -04:00
Michael Muller
7716eebfff
Change check for root directory during rollback (#1602)
* Change check for root directory during rollback

`rollback_tool` tries to infer the root of the nomulus tree by checking for a
directory named "nomulus".  This is potentially problematic (and, indeed, was
for me) since there is no guarantee what that directory will be named.

There are a number of features that characterize the root directory.  Check
for the presence of the `rollback_tool` wrapper script, as this is both at
root level and tightly coupled to the python code, so hopefully we won't
move it without testing that the script still works.
2022-04-25 12:39:16 -04:00
gbrodman
073d0a416a
Create a Dataflow pipeline to resave EPP resources (#1553)
* Create a Dataflow pipeline to resave EPP resources

This has two modes.

If `fast` is false, then we will just load all EPP resources, project them to the current time, and save them.

If `fast` is true, we will attempt to intelligently load and save only resources that we expect to have changes applied when we project them to the current time. This means resources with pending transfers that have expired, domains with expired grace periods, and non-deleted domains that have expired (we expect that they autorenewed).
2022-04-15 15:46:35 -04:00
Weimin Yu
65c2570b8f
Remove dos.xml from the configs (#1587)
* Remove dos.xml from the configs

We don't have dos config right now, and applying dos from "gcloud app
deploy" is deprecated and has started causing problems.

If we add dos configs, it should be using "gcloud app firewall-rules".
2022-04-11 15:22:42 -04:00
Weimin Yu
4f33de10f3
Add a tools command to launch SQL validation job (#1526)
* Add a tools command to launch SQL validation job

Stopping using Pipeline.run().waitUntilFinish in
ValidateDatastorePipeline. Flex-templalate does not support blocking
wait in the main thread.

This PR adds a new ValidateSqlCommand that launches the pipeline and
maintains the SQL snapshot while the pipeline is running.

This PR also added more parameters to both ValidateSqlCommand and
ValidateDatastoreCommand:
- The -c option to supply an optional incremental comparison start time
- The -r option to supply an optional release tag that is not 'live',
  e.g., nomulus-DDDDYYMM-RC00

If the manual launch option (-m) is enabled, the commands will print the
gcloud command that can launch the pipeline.

Tested with sandbox, qa and the dev project.
2022-02-28 13:14:57 -05:00
Lai Jiang
fa9b784c5c
Correctly delete all stopped versions except for the most recent 3 (#1511)
The gcloud command does some weird stuff with sorting when custom format
is used. Here we instead rely on linux sort and head command to sort the
versions list.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1511)
<!-- Reviewable:end -->
2022-02-03 16:04:58 -05:00
Weimin Yu
1253fa479a
Release ValidateSqlPipeline as container image (#1504)
* Release ValidateSqlPipeline as container image
2022-01-28 14:57:31 -05:00
Weimin Yu
5f0dd24906
Release ValidateDatastorePipeline (#1501)
* Release ValidateDatastorePipeline
2022-01-26 13:38:19 -05:00
Lai Jiang
ceade7f954
Use the service account credential to delete unused versions (#1484) 2022-01-07 11:06:19 -05:00
Lai Jiang
f6920454f6
Fix the beam staging script, take 3 (#1370)
The number of arguments changed in https://github.com/google/nomulus/pull/1369, so the check needs to change as well.

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1370)
<!-- Reviewable:end -->
2021-10-04 16:44:32 -04:00
Lai Jiang
9103216a46
Fix beam deployment script again. (#1369)
uberjar task and uberjar name are now different (beamPipelineCommon and
beam_pipeline_common, respectively). This is more idiomatic with regard
to naming conventions but we need to take two different variables now.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1369)
<!-- Reviewable:end -->
2021-10-04 14:23:28 -04:00
Lai Jiang
737f65bd33
Change Beam uber jar name in Nomulu release GCB config (#1367)
The uber jar name was changed in #1351.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1367)
<!-- Reviewable:end -->
2021-10-04 10:47:27 -04:00
Michael Muller
420a0b8b9a
Use debian10 image for builder, not ubuntu1804 (#1345)
The debian10 image is generally a bit more recent and, in particular, includes
python 3.7.3, which we're currently using as a baseline for our builds.
2021-09-28 14:49:13 -04:00
Lai Jiang
047444831b
Add a Beam pipeline to generate RDE deposit (part 1) (#1219)
This is the first part of the RdeStagingAction SQL migration where the
mapper logic is implemented in Beam.

A few helper methods are added to convert the DomainContent, HostBase
and ContactBase to their respective terminal child classes. This is
necessary and possible because the child classes do not have extra
fields and the base classes exist only to be embedded to other entities
(such as the various HistoryEntry entities). The conversion is necessary
because most of our code expects the terminal classes, such as the
RdeMarshaller's various marshallXXX() methods. The alternative would be
to change all the call sites, which seems to be much more disruptive.

Unfortunately there is is no good way to do this conversion than just
creating a builder and setting every fields there is.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1219)
<!-- Reviewable:end -->
2021-06-30 13:54:24 -04:00