The main purpose of this PR is to help debug b/234189023, where a
registrar reported that in sandbox they observed seemingly successful EPP
update responses to delete NS records, which are not actually deleted after
the commands executed.
To actually load the persisted domain resource after an update would
require us to execute another transaction immediately after the update
transaction and that can only be achieved outside the flow (i. e. in
FlowRunner or EppController) and we need to test for the type of flows
before logging, which seems unnecessarily complex.
For now we are just adding logs inside the update transaction itself to
validate that:
1. The NS records to delete are as expected.
2. The Current NS records are as expected.
3. The new NS records to persist are as expected.
The EPP success reply is the default reply when no errors are thrown in
a transaction. If we see a success reply (which means that the
transaction finished successfully) and expected logs from the transaction, the
only explanation could be that somewhere in the ORM layer the java
representation of what the entity is is different from what is being
presented to the database. I think that signals a much bigger and
fundamental problem, which is quite unlikely given how isolated the
issue under consideration is.
In any case we would like to add the logging functionality in sandbox and ask
the registrar to report again when they see similar issues.
Also made some typo and linting fixes.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1663)
<!-- Reviewable:end -->
* Fix some small transactional issues in SQL mode
These weren't caught until I switched the default database type in tests
to be SQL (separate PR). Fortunately these don't seem to be catastrophic
This includes:
- removing the actions that do the replay
- removing the tests for the replay
- removing the ReplayExtension and adjusting the various tests that used
it appropriately
- removing functionality relating to "things that happen during replay",
e.g. beforeSqlSaveOnReplay
This does not include:
- removing the InitSqlPipeline or similar tasks
- removing e.g. SqlEntity (it's used in other places)
- removing Transforms/RegistryJpaIO and other SQL-pipeline-creation code
This included removing ofy-specific code from various tests. Also, some
of the other tests (e.g. RdapDomainActionTest) had to be configured to
use only SQL -- otherwise, as it currently stands, they were trying to
use ofy.
We also delete the CreateSyntheticHistoryEntriesAction and pipeline
because they're no longer relevant, and impossible to test (the goal of
the actions were to create objects in ofy which doesn't happen any
more).
We're running into issues pulling 2.1.3 from maven, possibly due to
vulnerabilities in dependencies, so this updates it to the most recent
version of 2.2.6.
* Inject a DomainPricingLogic into ExpandRecurringBillingEventsAction
This will be used in other PRs to set the renewal price correctly based on the
renewal price behavior of the BillingRecurrence event.
Note that, in order for this to work, a not-null constraint has been lifted on
the EPP flow state when the DomainPricingCustomLogic is being constructed, as
the pricing here will occur in a backend action outside the context of any EPP
flow.
* Slightly improve performance of ExpandRecurringBillingEventsAction
We don't need to log every single no-op batch of 50 Recurrences that are
processed (considering we have 1.5M total in our system), and we also don't need
to process Recurrences that already ended prior to the Cursor time (this gets us
down to 420k from 1.5M).
* Disable Ofy tests.
This change just turns off the Ofy tests at the root, by removing processing
for dual tests and disassociating the TestOfyOnly annotation from test
annotations.
This is far less comprehensive than #1631, but it's probably worth entering as
a stopgap solution just because it should speed up our test runs and unblock a
lot of other cleanup work.
* Fix DualDatabaseTestInvocationContextProviderTest
* Optimize RDAP entity event query
For each EPP entity, directly load the latest HistoryEntry per event type
instead of loading all events through the HistoryEntryDao.
Although most entities have a small number of history entries, there are
a few entities with many entries, enough to cause OutOfMemory error.
* Add batching to ExpandRecurringBillingEventsAction
It's OOMing on trying to load every single BillingRecurrence that needs to be
expanded simultaneously (which is to be expected). So this processes them in
transactional batches of 50.
* Cleanup gpg-agent instances and home directories
The GpgSystemCommandException leaks home directories, but more importantly it
leaks gpg-agent instances. This can cause problems with inotify limits, since
the agent seems to make use of inotify. Do a proper cleanup in afterEach().
* Don't fail if we can't kill the agent
We'll delete the associated code soon enough too, but it's safer to delete the
cron jobs first and run in that state for a week, so we can still run them
manually if need be.
* Reduce the number of manually scaled instances for default/pubapi
This is in the spirit of "not always running significantly over-provisioned",
which helps to save costs and also expose potential scaling issues when they are
still small rather than all at once when they're a big problem.
This can always be reverted if necessary, and can be instantaneously adjusted by
running the `nomulus set_num_instances` command.
* Don't enforce billing account map check on TEST TLDs
This was affecting monitoring (i.e. prober TLDs). Note that test TLDs are
already excluded from the billing account map check in the Registrar builder()
method (see PR #1601), but we forgot to make that same test TLD exclusion in the
EPP flows check (see PR #1605).
* Add missing transaction for whois lookups
Nameserver whois lookups are failing under SQL for hosts with superordinate
domains because the query in this case is not done in a transaction. We
missed this during testing because a) we didn't have a test for lookups of
hosts with superordinate domains and b) we missed converting
NameserverWhoisResponseTest to a DualDatabaseTest.
This PR fixes the problem and adds the requisite testing.
* Use a single transaction to get host registrars
* Replace streaming with Maps.toMap()
* Downgrade dependencies that no longer support Java8
Downgrade two dependencies whose latest versions no longer support
java8.
A follow up PR will add java8 compatibility to presubmit tests.
* Use Gradle dependency dynamic versioning
Use dynamic versioning for Gradle dependencies when possible.
Please refer to go/dr-dependency-upgrade for more information about the
automation plan.
This PR calls out all dependencies that must be pinned to specific
versions for various reasons. The remaining ones are converted to
open-ended version ranges ("[version_str,)").
* Check PAK on domain create
* Add unit test
* update docs
* Remove unneccesary setup
* Fix blank line
* Add check and test to all relevant flows
* Change error message
* Ignore version UIDs during txn deserialization
When deserializing transactions for replay to datastore, ignore class version
UIDs that don't match those of the local classes and just use the local class
descriptors instead. This is a simple solution for the problem of persisted
VKeys containing references to classes where the class has been updated and
the serial version UID has changed.
Also add a "replay_txns" command that replays the transactions from a given
start point so we can verify all transactions are deserializable.
TESTED:
Ran replay_txns against all transactions on sandbox beginning with
transaction id 1828385, which includes Recurring billing events containing
both the old and the new serial version UIDs.
This will require edits to a substantial number of registrars on sandbox (nearly
all of them) because almost all of them have access to at least one TLD, but
almost none of them have any billing accounts set. Until this is set, any updates
to the existing registrars that aren't adding the billing accounts will cause
failures.
Unfortunately, there wasn't any less invasive foolproof way to implement this
change, and we already had one attempt to implement it on create registrar
command that wasn't working (because allowed TLDs tend not to be added on
initial registrar creation, but rather, afterwards as an update).
* Downgrade Caffeine to 2.9.3
Apparently Caffeine >=3.* requires Java 11, and we're still stuck on Java 8
because of App Engine Standard. Fortunately this doesn't affect the exposed
interface we're using, so we can simply go back to the newest Caffeine version
once Registry 3.0 Phase 3 (GKE migration) is completed.
Now that SQL is the default, we do not need this side job to run
alongside the main one. Its purpose was to validate the BEAM pipeline
while Datastore was primary.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1599)
<!-- Reviewable:end -->
* Create a Dataflow pipeline to resave EPP resources
This has two modes.
If `fast` is false, then we will just load all EPP resources, project them to the current time, and save them.
If `fast` is true, we will attempt to intelligently load and save only resources that we expect to have changes applied when we project them to the current time. This means resources with pending transfers that have expired, domains with expired grace periods, and non-deleted domains that have expired (we expect that they autorenewed).
For some inexplicable reason, the RDE beam pipeline in both sandbox and
production has been broken for the past week or so. Our investigations
revealed that during the CoGropuByKey stage, some repo ID -> revision ID
pairs were duplicated. This may be a problem with the Dataflow runtime
which somehow introduced the duplicate during reshuffling.
This PR attempts to fix the symptom only by deduping the revision IDs. We
will do some more investigation and possibly follow up with the Dataflow
team if we determine it is an upstream issue.
TESTED=deployed the pipeline and successfully run sandbox RDE with it.
* Begin migration from Guava Cache to Caffeine
Caffeine is apparently strictly superior to the older Guava Cache (and is even
recommended in lieu of Guava Cache on Guava Cache's own documentation).
This adds the relevant dependencies and switch over just a single call site to
use the new Caffeine cache. It also implements a new pattern, asynchronously
refreshing the cache value starting from half of our configuration time. For
frequently accessed entities this will allow us to NEVER block on a load, as it
will be asynchronously refreshed in the background long before it ever expires
synchronously during a read operation.
* Add new columns to BillingEvent.java
* Improve PR and modifyJodaMoneyType to handle null currency in override
* Add test cases for edge cases of nullSafeGet in JodaMoneyType
* Improve assertions