Commit graph

3963 commits

Author SHA1 Message Date
Rachel Guan
3ef1e6c6a4
Add renewal columns in BillingRecurrence (#1568)
* Add renewal columns in BillingRecurrence

* Change from event to recurrence in file name
2022-03-28 17:42:01 -04:00
Lai Jiang
9363b30b3e
Set the initial worker count for the RDE beam pipeline at 24 (#1572)
* Set the initial worker count for the RDE beam pipeline at 24

This likely will speed up the pipeline by skipping the initially slow
process of spinning up instances.
2022-03-27 22:51:58 -04:00
Weimin Yu
342f7d72a2
Fix sporadic SQL Snapshot failure (#1571)
* Fix sporadic SQL Snapshot failure

The Postgresql set-snapshot statement (called in
JpaTransactionManager.setDatabaseSnapshot() method) must be the first
statement in the SQL transaction.

Currenty the JpaTransaction.transact() method may insert a query for
DatabaseMigrationStateSchedule before the user query when the cache is
empty or the cached value expires.

This PR proactively preloads the cache in RegistryJpaIO to prevent cache
loading  inside the transaction.

This PR also changes some DatabaseSnapshotTest tests to be retrying, in
case they run just after the cache expires. (This has happened before in
CI).
2022-03-25 15:55:00 -04:00
Michael Muller
3592877210
Terminate with exit code of 1 on format check fail (#1570)
The format check script currently outputs "true" if there were files that need
reformatting and "false" if not, which is useful for gradle but less so for
other applications (notably commit hooks).  Terminate with an exit code of 1
if the format check fails.

TESTED: Tried this from both a pre-commit hook and from the gradle build.
2022-03-25 12:50:27 -04:00
Michael Muller
3006ca39ca
Add a "list_txns" to dump Transaction table (#1569)
* Add a "list_txns" to dump Transaction table

Add the list_txns command which can dump the entire contents of the
Transaction table, either in csv format or as human readable transactions.

The CSV format is useful for storing the transaction table at a specific point
in time for later reference without requiring us to repeatedly hit the
replica.

Creating this without tests because this command has a very short shelf-life
and is really only intended to be run by developers.  Tested all features
locally.

* Reformatted
2022-03-25 12:41:10 -04:00
gbrodman
63adfa77ed
Use TLS v1.3 explicitly in RDE reporting (#1564)
* Use TLS v1.3 explicitly in RDE reporting

The default Java 1.8 TLS version is 1.2 which isn't supported by the
ICANN upload site.
2022-03-25 12:09:46 -04:00
Weimin Yu
fd5e5bf6f1
Ignore trivial differences when comparing DB (#1567)
* Ignore trivial differences when comparing DB

Some data difference are due to entity model differences and also
harmless. We should igore them when comparing Datastore and SQL.

  This PR ignores the following diffs:
  - null vs empty collection
  - the empty string in Address.stree field, which is a list
2022-03-23 13:52:31 -04:00
sarahcaseybot
2495167215
Address some tiny TODOs (#1566)
* Address some tiny TODOs

* Format fix
2022-03-23 12:23:29 -04:00
gbrodman
0c6f399533
Bump flogger and beam dependency versions (#1562)
* Bump flogger and beam dependency versions

Beam 2.34.0 -> 2.37.0
Flogger 0.7.3 -> 0.7.4

Intellij keeps getting confused about which version of Flogger we're
bringing in. Even though we had previously locked Flogger to 0.7.3, for
some reason it was still bringing in the Beam transitive dependency of
0.6.0 which was causing the a bunch of class initialization errors.

Bumping Beam to 2.34.0 bumps the transitive dependency to 0.7.4 so we
can always use that.
2022-03-22 16:08:32 -04:00
Michael Muller
075ea23f1d
Fix build warning (#1565) 2022-03-22 10:34:36 -04:00
Lai Jiang
3c19d4cbf6
Some code health fixes (#1563)
1. testRun_withPrefix() in RdeUploadActionTest does calls a mock lock
   handler and does not actually try to read from the fake GCS
   implementation. Therefore there's no point settig it up.

2. Remove an unused field in UploadDatastoreBackupActionTest.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1563)
<!-- Reviewable:end -->
2022-03-22 09:49:14 -04:00
Rachel Guan
ea4d60c830
Remove static methods in back up actions (#1481)
* Remove static methods in back up actions

* Remove BigqueryPollJob helper class

* Add schedule time in task comparison

* Change payload type from byte[] to ByteString
2022-03-17 17:54:20 -04:00
Lai Jiang
c24e0053c8
Fix a subtle issue in BRDA copy caused by Cloud Tasks (#1556)
* Fix a subtle issue in BRDA copy caused by Cloud Tasks

After the Cloud Tasks migration and #1508, the BRDA copy job now
routinely fail on the first try because the revision update is not
commited by the time the Cloud Tasks job enqueued in the same
transaction runs for  the first time. This is because the enqueueing is
a side effect and not part of the transaction. The job eventually
succeeds because of retries.

This PR attempts to mitigate the initial failure by adding a delay to
the enqueued job, and checking the cursor in the job itself to prevent
it from running before the transaction is commited.
2022-03-17 16:32:07 -04:00
Michael Muller
537a6e4466
Fix issues with saving and deleting gap records (#1561)
* Fix issues with saving and deleting gap records

Datastore limits us to mutating up to 25 records per transaction.  We
sometimes exceed that when deleting expired gap records.  In addition, it is
theoretically possible for us to accumulate enough continuous gap records to
exceed this count while replaying the original transaction.

Deal with deletion by breaking up the gap records to be deleted into a batch
size that is small enough to be deleted transactionally (in practice, we don't
much care about the transactionality but it doesn't seem like we can delete
batches without it).

Deal with the possibility of too many additions by always breaking out gap
record storage and last transaction number updates into their own
transaction(s) (separate from the replay of the original SQL transaction).
2022-03-17 15:09:59 -04:00
Ben McIlwain
6c20d39a2d
Add domain repo ID and token type indexes to AllocationToken table (#1560)
These are useful for the purposes of filtering by one-time/multi-use tokens, and
for determining which one-time tokens have been used (and if so, for which
domain).
2022-03-17 13:58:45 -04:00
Michael Muller
60c156c061
Track and replay Transaction table gaps (#1557)
* Track and replay Transaction table gaps

Id gaps in the Transaction table can be the result of a transactions committed
out of order.  To deal with this, keep track of gaps for up to five minutes
and check to see if they've been back-filled prior to applying the next batch
of transactions during reply.

* Changes for review

* Calculate gap expiration time before gap queries

* Reformat.
2022-03-16 11:08:45 -04:00
Ben McIlwain
742ad0b37c
Add 3 more SQL indexes to the Host table (#1559)
* Add 3 more SQL indexes to the Host table

These indexes on creationTime, deletionTime, and currentSponsorRegistrarId are
present on the other two EPP resource tables (Domain and Contact), and are
useful for a wide variety of operations/analytics queries.
2022-03-15 22:16:36 -04:00
Weimin Yu
6dd6ebce75
Improve cache loading in Registries.java (#1558)
* Improve cache loading in Registries.java

The loader for the TLD cache in Registries.java unnecessarily reads from
another cache when running with SQL, potentially triggering additional
database access. This code runs in the whois query path, and contributes
to the high latency in sandbox.
2022-03-15 22:02:05 -04:00
Ben McIlwain
767e3935af
Make DigestType.fromWireValue() more performant (#1555)
* Make DigestType.fromWireValue() more performant
2022-03-15 13:00:54 -04:00
Rachel Guan
86aa420773
Add a a delay to task in CommitLogCheckpointAction (#1554)
* Add delay to task
2022-03-15 10:49:35 -04:00
Ben McIlwain
61b38569e2
Add domainRepoId indexes to billing events (#1544)
The query analyzer identified this is a missing index on the BillingEvent table,
and I added it for recurrences and cancellations as well as it's likely to be a
problem for them too. "Give me all the billing events associated with a given
domain by its repo ID" seems like a pretty common use case for the DB (and does
appear to be used by our invoicing pipeline).

This is a follow-up to PR #1545.
2022-03-14 17:20:58 -04:00
Ben McIlwain
4bfa19a90c
Add 9 more indexes to SQL schema (#1540)
These indexes were identified as missing by PostgreSQL's query analyzer in our
sandbox environment (where we get enough realistic EPP traffic to identify these
deficiencies).

Note that a lot of the new indexes being named have to use the DB representation
of the column name because they are either embedded or subclassed entities,
whereas most of the existing ones are able to simply refer to Java field names.

This is the Java schema follow-up PR to PR #1541, which is what added the
actual DB changes through Flyway scripts.
2022-03-14 15:59:05 -04:00
Rachel Guan
a001df6d7a
Remove AppEngineServiceUtils dependency (#1534) 2022-03-14 14:10:30 -04:00
Rachel Guan
6caf7819ed
Replace email content with placeholders (#1530)
* Replace email content with placeholders

* Improve sample email wording
2022-03-14 11:30:58 -04:00
Weimin Yu
757803e985
Revise host.inet_addresses query to use gin index (#1550)
* Revise host.inet_addresses query to use gin index
2022-03-09 23:32:17 -05:00
Weimin Yu
4b1f4f96e3
Reorganize new schema changes (#1551)
* Reorganize new schema changes

Reorganized new schema changes and make each flyway script update a
single table.

Each flyway script is executed in a single database transaction so that
the script can be rolled back in one shot. It acquires a shared lock on
all tables touched by the script. This is deadlock-prone because in a
busy database, there may be user queries that attempt to lock the same
set of tables, but in different order. By limiting each script to one
table, we avoid the problem.

We should have some a presubmit check to enforce this rule.

All changes have been deployed to Sandbox out-of-band. When doing so,
we changed all CREATE INDEX statements to CREATE INDEX IF NOT EXISTS.

Future deployments should be able to proceed normally.
2022-03-09 20:47:24 -05:00
Ben McIlwain
bd49e8b238
Add anti-deadlock instructions to DB update README (#1552) 2022-03-09 18:50:15 -05:00
Weimin Yu
6249a8e118
Revise Host index on inet_addresses (#1549)
* Revise Host index on inet_addresses

The index on the 'inet_addresses array column should be of gin or gist
type, which index individual array elements. We use gin for now since
host updates are not often, and gin has better accuracy.

Since flyway script V108__... has not been deployed, we  edit the file
in place instead of adding a new script.

This will be followed up with a modified query that can take advantage
of the gin index. Until then we don't expect to see performance
improvement.

The suspected bottlenect query in the whois path is:

select * from "Host" where 'non-ip-string' = any(inet_address) and
deletion_time < now();

It needs to be revised into:

select * from "Host" where array['non-ip-string'] <@ inet_address and
deletion_time < now();

The combined change reduces the query time from 90ms to 30ms in Sandbox,
and from 150ms to 40ms in production.

It is unclear if this solves all problem with whois latency.
2022-03-08 14:22:01 -05:00
Ben McIlwain
9b7bb12cd1
Add deletionTime/inetAddresses indexes to Host table to support WHOIS (#1548)
* Add deletionTime/inetAddresses indexes to Host table to support WHOIS

Weimin identified these as missing, and being the cause of slowdowns in
NameserverLookupByIpCommand that we're seeing in sandbox.

This is the first of two PRs, adding just the Flyway/schema changes. The
second PR adding the Java object model changes is #1547.
2022-03-07 16:07:11 -05:00
Michael Muller
71d13bab71
Improve Transaction gap processing (#1546)
Skip multiple gaps in one pass and write the correct transaction id to
datastore.
2022-03-07 13:26:18 -05:00
Ben McIlwain
8db28b7e61
Add domainRepoId indexes to billing events (#1545)
* Add domainRepoId indexes to billing events #1544

The query analyzer identified this is a missing index on the BillingEvent table,
and I added it for recurrences and cancellations as well as it's likely to be a
problem for them too. "Give me all the billing events associated with a given
domain by its repo ID" seems like a pretty common use case for the DB (and does
appear to be used by our invoicing pipeline).

This is the first of two PRs that makes just the DB changes. The second PR
(#1544) will add the Java code changes, and will be committed after this one is
deployed.
2022-03-07 12:21:40 -05:00
Lai Jiang
5a7dc307c5
Update the the latest LTS Node version (#1543)
<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1543)
<!-- Reviewable:end -->
2022-03-07 11:49:18 -05:00
Ben McIlwain
67278af3cb
Add 9 more indexes to SQL schema (#1541)
* Add 9 more indexes to SQL schema

This indexes were identified as missing by PostgreSQL's query analyzer in our
sandbox environment (where we get enough realistic EPP traffic to identify these
deficiencies).

This is the first of two PRs -- the second PR (#1540) will be merged only after
this one is live in production. Note that this is the PR that actually modifies
the database though, so once this one is deployed we will already have the
benefit of the new indexes.
2022-03-05 17:57:36 -05:00
sarahcaseybot
1eafc983ab
Check signature length in DS records (#1538)
* Check signature length in DS records

* Small fixes

* Add unit tests

* Formatting fix
2022-03-04 15:18:14 -05:00
gbrodman
f1bbdc5a0b
Use built-in Java URL connections instead of UrlFetchService (#1535)
- Use the standard HttpsURLConnection to write/read data
- Rewrite RdeReporter, Nordn*Action, and Marksdb classes and related
  tests to conform to the new format
- Remove FakeURLFetchService and ForwardingUrlFetchService as they weren't used
- Refactor UrlFetchException to UrlConnectionException
- Refactor UrlFetchUtils to UrlConnectionUtils

I will need to test this on Alpha. Fortunately the connections that
don't require auth (e.g. TMDB downloading) should be testable.
2022-03-04 14:16:22 -05:00
Michael Muller
b146301495
Allow replicateToDatastore to skip gaps (#1539)
* Allow replicateToDatastore to skip gaps

As it turns out, gaps in the transaction id sequence number are expected
because rollbacks do not rollback sequence numbers.

To deal with this, stop checking these.

This change is not adequate in and of itself, as it is possible for a gap to
be introduced if two transactions are committed out of order of their sequence
number.  We are currently discussing several strategies to mitigate this.

* Remove println, add a record verification
2022-03-04 09:04:13 -05:00
Weimin Yu
437a747eae
Pass stack trace to validate_datastore user (#1537)
* Pass stack trace to validate_datastore user
2022-03-03 16:10:31 -05:00
Weimin Yu
a620b37c80
Fix hanging test (#1536)
* Fix hanging test

Tests using the TestServerExtension may hang forever if an underlying
component (e.g., testcontainer for psql) fails. This may be the cause
of the some kokoro runs that timeed out after three hours.
2022-03-03 14:43:32 -05:00
Rachel Guan
267cbeb95b
Inject CloudTasksUtil to AsyncTaskEnqueuer (#1522)
* Inject CloudTasksUtil to AsyncTasksEnqueuer

* Rebase

* Remove QUEUE_ASYNC_DELETE from AsyncTasksEnqueuer

* Refactor create() 

* Remove AppEngineServiceUtil depdendency from AsyncTaskEnqueuer
2022-03-02 11:31:45 -05:00
Weimin Yu
b9fcabbc36
Save db discrepancies to GCS (#1527)
* Save db discrepancies to GCS
2022-02-28 16:52:09 -05:00
Weimin Yu
4f33de10f3
Add a tools command to launch SQL validation job (#1526)
* Add a tools command to launch SQL validation job

Stopping using Pipeline.run().waitUntilFinish in
ValidateDatastorePipeline. Flex-templalate does not support blocking
wait in the main thread.

This PR adds a new ValidateSqlCommand that launches the pipeline and
maintains the SQL snapshot while the pipeline is running.

This PR also added more parameters to both ValidateSqlCommand and
ValidateDatastoreCommand:
- The -c option to supply an optional incremental comparison start time
- The -r option to supply an optional release tag that is not 'live',
  e.g., nomulus-DDDDYYMM-RC00

If the manual launch option (-m) is enabled, the commands will print the
gcloud command that can launch the pipeline.

Tested with sandbox, qa and the dev project.
2022-02-28 13:14:57 -05:00
Michael Muller
d6bb83f6d3
Nullify contact fields in setContactFields (#1533)
When setting contact fields from a set of DesignatedContact's, nullify the
existing fields so they don't stick around if they're not in the new set.
2022-02-28 10:57:35 -05:00
Michael Muller
a8d3d22c5a
Don't reset the update time for TLD updates (#1532)
* Don't reset the update time for TLD updates

It turns out that the reason that the Registrar update timestamp isn't updated
for some of the tests is because the record is updated unchanged.  We can
avoid this problem by not trying to update the registrar to the same value.
So in this case, if the registrar alreay contains the TLD we're adding, don't
try to add it.
2022-02-25 13:09:36 -05:00
Rachel Guan
fac659b520
Inject CloudTasksUtils to DomainLockUtils (#1519)
* Move enqueueDomainRelock to DomainLockUtils

* Rebase and improve PR

* Inject CloudTaskUtils to DomainLockUtils
2022-02-25 11:38:38 -05:00
gbrodman
178702ded3
Fix DTR creation in one location and clean up replay comparison (#1529)
* Fix DTR creation in one location and clean up replay comparison
2022-02-23 11:07:10 -05:00
Lai Jiang
59bca1a9ed
Disable sending cert expiration emails on sandbox (#1528) 2022-02-22 14:46:27 -05:00
Michael Muller
f8198fa590
Do full database comparison during replay tests (#1524)
* Fix entity delete replication, compare db @ replay

Replay tests currently only verify that the contents of a transaction are
can be successfully replicated to the other database.  They do not verify that
the contents of both databases are equivalent.  As a result, we miss any
changes omitted from the transaction (as was the case with entity deletions).

This change adds a final database comparison to ReplayExtension so we can
safely say that the databases are in the same state.

This comparison is introduced in part as a unit test for the one-line fix for
replication of an "entity delete" operation (where we delete using an entity
object instead of the object's key) which so far has only affected PollMessage
deletion.  The fix is also included in this commit within
JpaTransactionManagerImpl.

* Exclude tests and entities with failing comparisons

* Get all tests to pass and fix more timestamp

Fix most of the unit tests that were broken by this change.

- Fix timestamp updates after grace period changes in DomainContent and for
  TLD changes in Registrar.
- Reenable full database comparison for most DomainCreateFlowTest's.
- Make some test entities NonReplicated so they don't break when used with
  jpaTm().delete()
- Diable checking of a few more entity types that are failing comparisons.
- Add some formatting fixes.

* Remove unnecessary "NoDatabaseCompare"

I turns out that after other fixes/elisions we no longer need these for
any tests in DomainCreateFlowTest.

* Changes for review

* Remove old "compare" flag.

* Reformatted.
2022-02-22 10:49:57 -05:00
Lai Jiang
bbac81996b
Make a few quality-of-life improvements in CloudTasksUtils (#1521)
* Make a few quality-of-life improvements in CloudTasksUtils

1. Update the method names. There are too many overloaded methods and it
   is hard to figure out which one does which without checking the
   javadoc.

2. Added a method in the task matcher to specify the delay time in
   DateTime, so the caller does not need to convert it to Timestamp.

3. Remove the expilict dependency on a clock when enqueueing a task with
   delay, the clock is now injected directly into the util instance
   itself.
2022-02-18 20:21:56 -05:00
Ben McIlwain
52c759d1db
Disable prober data deletion cron job in prod & sandbox (#1525)
* Disable prober data deletion cron job in prod & sandbox

This is going to unnecessarily make the database migration more complex, and we
don't need them that badly. We'll re-enable these cron jobs once we've written
the new version of this action that handles Cloud SQL correctly (the current
version only does Datastore anyway).
2022-02-17 08:46:40 -08:00
Weimin Yu
453af87615
Ignore prober data when comparing databases (#1523)
* Ignore prober data when comparing databases

Completely ignore prober data when comparing Datastore and SQL.

Prober data deletions are not propagated from Datastore to SQL. It is
difficult to distinguish soft-deletes from normal updates, therefore
difficult to avoid false positives when looking for differences.
2022-02-15 12:01:20 -05:00