* Ignore trivial differences when comparing DB
Some data difference are due to entity model differences and also
harmless. We should igore them when comparing Datastore and SQL.
This PR ignores the following diffs:
- null vs empty collection
- the empty string in Address.stree field, which is a list
* Bump flogger and beam dependency versions
Beam 2.34.0 -> 2.37.0
Flogger 0.7.3 -> 0.7.4
Intellij keeps getting confused about which version of Flogger we're
bringing in. Even though we had previously locked Flogger to 0.7.3, for
some reason it was still bringing in the Beam transitive dependency of
0.6.0 which was causing the a bunch of class initialization errors.
Bumping Beam to 2.34.0 bumps the transitive dependency to 0.7.4 so we
can always use that.
1. testRun_withPrefix() in RdeUploadActionTest does calls a mock lock
handler and does not actually try to read from the fake GCS
implementation. Therefore there's no point settig it up.
2. Remove an unused field in UploadDatastoreBackupActionTest.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1563)
<!-- Reviewable:end -->
* Remove static methods in back up actions
* Remove BigqueryPollJob helper class
* Add schedule time in task comparison
* Change payload type from byte[] to ByteString
* Fix a subtle issue in BRDA copy caused by Cloud Tasks
After the Cloud Tasks migration and #1508, the BRDA copy job now
routinely fail on the first try because the revision update is not
commited by the time the Cloud Tasks job enqueued in the same
transaction runs for the first time. This is because the enqueueing is
a side effect and not part of the transaction. The job eventually
succeeds because of retries.
This PR attempts to mitigate the initial failure by adding a delay to
the enqueued job, and checking the cursor in the job itself to prevent
it from running before the transaction is commited.
* Fix issues with saving and deleting gap records
Datastore limits us to mutating up to 25 records per transaction. We
sometimes exceed that when deleting expired gap records. In addition, it is
theoretically possible for us to accumulate enough continuous gap records to
exceed this count while replaying the original transaction.
Deal with deletion by breaking up the gap records to be deleted into a batch
size that is small enough to be deleted transactionally (in practice, we don't
much care about the transactionality but it doesn't seem like we can delete
batches without it).
Deal with the possibility of too many additions by always breaking out gap
record storage and last transaction number updates into their own
transaction(s) (separate from the replay of the original SQL transaction).
These are useful for the purposes of filtering by one-time/multi-use tokens, and
for determining which one-time tokens have been used (and if so, for which
domain).
* Track and replay Transaction table gaps
Id gaps in the Transaction table can be the result of a transactions committed
out of order. To deal with this, keep track of gaps for up to five minutes
and check to see if they've been back-filled prior to applying the next batch
of transactions during reply.
* Changes for review
* Calculate gap expiration time before gap queries
* Reformat.
* Add 3 more SQL indexes to the Host table
These indexes on creationTime, deletionTime, and currentSponsorRegistrarId are
present on the other two EPP resource tables (Domain and Contact), and are
useful for a wide variety of operations/analytics queries.
* Improve cache loading in Registries.java
The loader for the TLD cache in Registries.java unnecessarily reads from
another cache when running with SQL, potentially triggering additional
database access. This code runs in the whois query path, and contributes
to the high latency in sandbox.
The query analyzer identified this is a missing index on the BillingEvent table,
and I added it for recurrences and cancellations as well as it's likely to be a
problem for them too. "Give me all the billing events associated with a given
domain by its repo ID" seems like a pretty common use case for the DB (and does
appear to be used by our invoicing pipeline).
This is a follow-up to PR #1545.
These indexes were identified as missing by PostgreSQL's query analyzer in our
sandbox environment (where we get enough realistic EPP traffic to identify these
deficiencies).
Note that a lot of the new indexes being named have to use the DB representation
of the column name because they are either embedded or subclassed entities,
whereas most of the existing ones are able to simply refer to Java field names.
This is the Java schema follow-up PR to PR #1541, which is what added the
actual DB changes through Flyway scripts.
- Use the standard HttpsURLConnection to write/read data
- Rewrite RdeReporter, Nordn*Action, and Marksdb classes and related
tests to conform to the new format
- Remove FakeURLFetchService and ForwardingUrlFetchService as they weren't used
- Refactor UrlFetchException to UrlConnectionException
- Refactor UrlFetchUtils to UrlConnectionUtils
I will need to test this on Alpha. Fortunately the connections that
don't require auth (e.g. TMDB downloading) should be testable.
* Allow replicateToDatastore to skip gaps
As it turns out, gaps in the transaction id sequence number are expected
because rollbacks do not rollback sequence numbers.
To deal with this, stop checking these.
This change is not adequate in and of itself, as it is possible for a gap to
be introduced if two transactions are committed out of order of their sequence
number. We are currently discussing several strategies to mitigate this.
* Remove println, add a record verification
* Fix hanging test
Tests using the TestServerExtension may hang forever if an underlying
component (e.g., testcontainer for psql) fails. This may be the cause
of the some kokoro runs that timeed out after three hours.
* Add a tools command to launch SQL validation job
Stopping using Pipeline.run().waitUntilFinish in
ValidateDatastorePipeline. Flex-templalate does not support blocking
wait in the main thread.
This PR adds a new ValidateSqlCommand that launches the pipeline and
maintains the SQL snapshot while the pipeline is running.
This PR also added more parameters to both ValidateSqlCommand and
ValidateDatastoreCommand:
- The -c option to supply an optional incremental comparison start time
- The -r option to supply an optional release tag that is not 'live',
e.g., nomulus-DDDDYYMM-RC00
If the manual launch option (-m) is enabled, the commands will print the
gcloud command that can launch the pipeline.
Tested with sandbox, qa and the dev project.
* Don't reset the update time for TLD updates
It turns out that the reason that the Registrar update timestamp isn't updated
for some of the tests is because the record is updated unchanged. We can
avoid this problem by not trying to update the registrar to the same value.
So in this case, if the registrar alreay contains the TLD we're adding, don't
try to add it.
* Fix entity delete replication, compare db @ replay
Replay tests currently only verify that the contents of a transaction are
can be successfully replicated to the other database. They do not verify that
the contents of both databases are equivalent. As a result, we miss any
changes omitted from the transaction (as was the case with entity deletions).
This change adds a final database comparison to ReplayExtension so we can
safely say that the databases are in the same state.
This comparison is introduced in part as a unit test for the one-line fix for
replication of an "entity delete" operation (where we delete using an entity
object instead of the object's key) which so far has only affected PollMessage
deletion. The fix is also included in this commit within
JpaTransactionManagerImpl.
* Exclude tests and entities with failing comparisons
* Get all tests to pass and fix more timestamp
Fix most of the unit tests that were broken by this change.
- Fix timestamp updates after grace period changes in DomainContent and for
TLD changes in Registrar.
- Reenable full database comparison for most DomainCreateFlowTest's.
- Make some test entities NonReplicated so they don't break when used with
jpaTm().delete()
- Diable checking of a few more entity types that are failing comparisons.
- Add some formatting fixes.
* Remove unnecessary "NoDatabaseCompare"
I turns out that after other fixes/elisions we no longer need these for
any tests in DomainCreateFlowTest.
* Changes for review
* Remove old "compare" flag.
* Reformatted.
* Make a few quality-of-life improvements in CloudTasksUtils
1. Update the method names. There are too many overloaded methods and it
is hard to figure out which one does which without checking the
javadoc.
2. Added a method in the task matcher to specify the delay time in
DateTime, so the caller does not need to convert it to Timestamp.
3. Remove the expilict dependency on a clock when enqueueing a task with
delay, the clock is now injected directly into the util instance
itself.
* Disable prober data deletion cron job in prod & sandbox
This is going to unnecessarily make the database migration more complex, and we
don't need them that badly. We'll re-enable these cron jobs once we've written
the new version of this action that handles Cloud SQL correctly (the current
version only does Datastore anyway).
* Ignore prober data when comparing databases
Completely ignore prober data when comparing Datastore and SQL.
Prober data deletions are not propagated from Datastore to SQL. It is
difficult to distinguish soft-deletes from normal updates, therefore
difficult to avoid false positives when looking for differences.
This is necessary because the Cloud Tasks API is not transactionally enrolled,
so it's possible that multiple tasks might end up being enqueued. We need to be
able to handle them.
* Fix update timestamps for DomainContent types
We expect update timestamps to be updated whenever a containing entity is
modified and persisted, but unfortunately Hibernate doesn't seem to do this --
instead it appears to regard such an entity as unchanged.
To work around this, we explicitly reset the update timestamp whenever a
nested collection is modified in the Builder.
Note that this change only solves the problem for DomainContent. All other
entitities containing UpdateAutoTimestamp will need to be audited and
instrumented with a similar change.
* Fix a handful of tests broken by this change
* Reformatted.
* Use CloudTaskUtils to enqueue
* Add CloudTasksUtilsModule to FrontendComponent
* Fix Uri query issue
* Remove header and check service in matcher
* Use a ThreadLocal boolean in TestServer to determine enqueueing
* Extract enqueuing and email sending from tm().transact()
* Add action to DB comparison pipeline
Add a backend Action in Nomulus server that lanuches the pipeline for
comparing datastore (secondary) with Cloud SQL (primary).
* Save progress
* Revert test changes
* Add pipeline launching
* Fix create/update timestamp replay problems
When CreateAutoTimestamp and UpdateAutoTimestamp are inserted into a
Transaction, their values are not populated in the same way as when they are
stored in the course of an SQL commit. This results in different timestamp
values between SQL and datastore during the SQL -> DS replay.
Fix this by providing these values from the JPA transaction time when we're
doing transaction serialization.
This change also removes the initialization of the Ofy clock in
ExpandRecurringBillingEventsActionTest. It's not necessary as the
ReplayExtension already takes care of this and doing it after the
ReplayExtension as we were breaks a test now that the update timestamps are
correct.
* Remove ReportingUtils and use CloudTaskUtil to enqueue
* Use schedule time helper to enqueue and update schedule time comparison
* Fix comment, indentation in gradle file and improve time comparison
* Change from TaskQueueUtils to CloudTasksUtils in LoadTestAction
* Put X_CSRF_TOKEN in task headers
* Fix schedule time and gradle issue
* Remove TaskQueue constant dependency
* Double run seconds
* Add comment for X_CSRF_TOKEN
* Fix flaky RdeStagingActionDatastoreTest
Fixed the most common cause that makes one method flaky (Clock and
timestamp problem). Added a TODO to rethink test case.
Also added notes on tasks potentially enqueued multiple times.
* Add an index on Host.host_name column
This field is queried during host creation and needs an index to speed
up the query.
Since Hibernate does not explicitly refer to indexes, we can change the
code and schema in one PR.