* Disable Ofy tests.
This change just turns off the Ofy tests at the root, by removing processing
for dual tests and disassociating the TestOfyOnly annotation from test
annotations.
This is far less comprehensive than #1631, but it's probably worth entering as
a stopgap solution just because it should speed up our test runs and unblock a
lot of other cleanup work.
* Fix DualDatabaseTestInvocationContextProviderTest
* Optimize RDAP entity event query
For each EPP entity, directly load the latest HistoryEntry per event type
instead of loading all events through the HistoryEntryDao.
Although most entities have a small number of history entries, there are
a few entities with many entries, enough to cause OutOfMemory error.
We have backend max-instances set to 100, which apparently exceeds the default
quota for GAE. Add info on updating the quota or changing this parameter to
the configuration doc.
* Add batching to ExpandRecurringBillingEventsAction
It's OOMing on trying to load every single BillingRecurrence that needs to be
expanded simultaneously (which is to be expected). So this processes them in
transactional batches of 50.
* Cleanup gpg-agent instances and home directories
The GpgSystemCommandException leaks home directories, but more importantly it
leaks gpg-agent instances. This can cause problems with inotify limits, since
the agent seems to make use of inotify. Do a proper cleanup in afterEach().
* Don't fail if we can't kill the agent
We'll delete the associated code soon enough too, but it's safer to delete the
cron jobs first and run in that state for a week, so we can still run them
manually if need be.
* Reduce the number of manually scaled instances for default/pubapi
This is in the spirit of "not always running significantly over-provisioned",
which helps to save costs and also expose potential scaling issues when they are
still small rather than all at once when they're a big problem.
This can always be reverted if necessary, and can be instantaneously adjusted by
running the `nomulus set_num_instances` command.
* Don't enforce billing account map check on TEST TLDs
This was affecting monitoring (i.e. prober TLDs). Note that test TLDs are
already excluded from the billing account map check in the Registrar builder()
method (see PR #1601), but we forgot to make that same test TLD exclusion in the
EPP flows check (see PR #1605).
* Add test for Java 8 Compatibility
Add a test to check for Java 8 compatibility of jars deployed to
AppEngine.
It is not enough to run existing tests with Java 8 VM, since many API
jars are not exercised by tests. For example, those for GCP services
like the SecretManager.
We take the conservative approach and verify that every class in every
jar are compiled for Java 8.
We would like to re-use the build cache when building RCs for different
environments. There's not much practical use in doing a "clean" for
every build when Gradle should be able to figure out which artifacts
need to be rebuilt. It also does not make sense to build each
environment in a separate step, which also introduces redunency because
not all artifacts are cached across steps. The build cache is enabled by
default.
Lastly, the cache needs to be inside the /workspace folder, which is the
default persisted storage location.
TESTED=tried to build the RCs on alpha and saved about 10 min.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1619)
<!-- Reviewable:end -->
* Add missing transaction for whois lookups
Nameserver whois lookups are failing under SQL for hosts with superordinate
domains because the query in this case is not done in a transaction. We
missed this during testing because a) we didn't have a test for lookups of
hosts with superordinate domains and b) we missed converting
NameserverWhoisResponseTest to a DualDatabaseTest.
This PR fixes the problem and adds the requisite testing.
* Use a single transaction to get host registrars
* Replace streaming with Maps.toMap()
* Downgrade dependencies that no longer support Java8
Downgrade two dependencies whose latest versions no longer support
java8.
A follow up PR will add java8 compatibility to presubmit tests.
We have recently started to routinely breach the 1h timeout. Increasing
this value to 2h. We should also look into reusing the artifacts when
building RCs for different environments.
* Use Gradle dependency dynamic versioning
Use dynamic versioning for Gradle dependencies when possible.
Please refer to go/dr-dependency-upgrade for more information about the
automation plan.
This PR calls out all dependencies that must be pinned to specific
versions for various reasons. The remaining ones are converted to
open-ended version ranges ("[version_str,)").
* Check PAK on domain create
* Add unit test
* update docs
* Remove unneccesary setup
* Fix blank line
* Add check and test to all relevant flows
* Change error message
* Ignore version UIDs during txn deserialization
When deserializing transactions for replay to datastore, ignore class version
UIDs that don't match those of the local classes and just use the local class
descriptors instead. This is a simple solution for the problem of persisted
VKeys containing references to classes where the class has been updated and
the serial version UID has changed.
Also add a "replay_txns" command that replays the transactions from a given
start point so we can verify all transactions are deserializable.
TESTED:
Ran replay_txns against all transactions on sandbox beginning with
transaction id 1828385, which includes Recurring billing events containing
both the old and the new serial version UIDs.
* Change check for root directory during rollback
`rollback_tool` tries to infer the root of the nomulus tree by checking for a
directory named "nomulus". This is potentially problematic (and, indeed, was
for me) since there is no guarantee what that directory will be named.
There are a number of features that characterize the root directory. Check
for the presence of the `rollback_tool` wrapper script, as this is both at
root level and tightly coupled to the python code, so hopefully we won't
move it without testing that the script still works.
This will require edits to a substantial number of registrars on sandbox (nearly
all of them) because almost all of them have access to at least one TLD, but
almost none of them have any billing accounts set. Until this is set, any updates
to the existing registrars that aren't adding the billing accounts will cause
failures.
Unfortunately, there wasn't any less invasive foolproof way to implement this
change, and we already had one attempt to implement it on create registrar
command that wasn't working (because allowed TLDs tend not to be added on
initial registrar creation, but rather, afterwards as an update).
* Downgrade Caffeine to 2.9.3
Apparently Caffeine >=3.* requires Java 11, and we're still stuck on Java 8
because of App Engine Standard. Fortunately this doesn't affect the exposed
interface we're using, so we can simply go back to the newest Caffeine version
once Registry 3.0 Phase 3 (GKE migration) is completed.
Now that SQL is the default, we do not need this side job to run
alongside the main one. Its purpose was to validate the BEAM pipeline
while Datastore was primary.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1599)
<!-- Reviewable:end -->
* Create a Dataflow pipeline to resave EPP resources
This has two modes.
If `fast` is false, then we will just load all EPP resources, project them to the current time, and save them.
If `fast` is true, we will attempt to intelligently load and save only resources that we expect to have changes applied when we project them to the current time. This means resources with pending transfers that have expired, domains with expired grace periods, and non-deleted domains that have expired (we expect that they autorenewed).
For some inexplicable reason, the RDE beam pipeline in both sandbox and
production has been broken for the past week or so. Our investigations
revealed that during the CoGropuByKey stage, some repo ID -> revision ID
pairs were duplicated. This may be a problem with the Dataflow runtime
which somehow introduced the duplicate during reshuffling.
This PR attempts to fix the symptom only by deduping the revision IDs. We
will do some more investigation and possibly follow up with the Dataflow
team if we determine it is an upstream issue.
TESTED=deployed the pipeline and successfully run sandbox RDE with it.
* Begin migration from Guava Cache to Caffeine
Caffeine is apparently strictly superior to the older Guava Cache (and is even
recommended in lieu of Guava Cache on Guava Cache's own documentation).
This adds the relevant dependencies and switch over just a single call site to
use the new Caffeine cache. It also implements a new pattern, asynchronously
refreshing the cache value starting from half of our configuration time. For
frequently accessed entities this will allow us to NEVER block on a load, as it
will be asynchronously refreshed in the background long before it ever expires
synchronously during a read operation.