This involves:
- Altering both transaction managers to check for a read-only mode at
the start of standard write actions (e.g. delete, put).
- Altering both raw layers (entity manager, ofy) to throw exceptions on
write actions as well
- Implementing bypass routes for reading / setting / removing the schedule itself
so that we don't get "stuck"
* Clean up ReplicateToDatastoreAction and tests
1. applyTransaction should throw an error if it fails; this allows us to
have more information in the caller (and it shouldn't usually happen)
2. Set a response code + payload now, since this is an action that is
called by cron
3. Add a method to the test log subject that allows us to check if a
severe log with a particular Throwable cause was logged (since the cause
isn't contained in the log message itself directly)
* Save indexes when replaying EppResources SQL->DS
We implement this similarly to how we implement the
beforeSqlSaveOnReplay callback in the other direction -- a
beforeDatastoreSaveOnReplay method that is called when replaying a
Mutation to Datastore. This means that the asynchronous replay will
create the relevant ForeignKeyIndex and EppResourceIndex objects for
EppResources saved when SQL is primary.
* Implement a util class to manage push queues using Cloud Tasks API
Push queues were part of App Engine when they debuted. As a result the
Task Queue API were part of the App Engine SDK and can only be used in
App Engine classic runtime. The new Cloud Tasks API can be used in any
runtime but it only supports push queues. In this PR we implement a util
class (CloudTasksUtils) like TaskQueueUtils to handle enqueuing tasks to
push queues using Cloud Tasks. One action (TldFanoutAction) was
converted to use the new API as a demo. Mass migration of other call sites of
the old API will follow in a separate PR.
TESTED=deployed to alpha and verified that tasks are corrected enqueued
and executed.
Add double-replay to the Host*Flow tests to show how this works. The
only change to the double replay itself is that now we store the
Datastore entity in the TransactionEntity object -- this is because we
use Objectify to serialize the objects into bytes and we need it to know
about the entity in question.
* Add DS->SQL replay cron job to production
This won't do anything until we set the migration schedule to
DATASTORE_PRIMARY. Actions in order:
1. Add this cron job (it'll be a no-op)
2. Run the init-sql-pipeline to populate production's SQL DB
3. Set the SqlReplayCheckpoint to a time before the smear backup that
was used in step #1 (maybe 30 minutes)
4. Set the database migration schedule to transition to
DATASTORE_PRIMARY at some point
* Remove ReservedList from Datastore schema
* Remove some Datastore references
* Add a different non-replicated entity to ReplayCommitLogsToSqlActionTest
We shouldn't reference tm() at all before initializing the JPA
transaction manager, since tm() looks at the database migration schedule
when figuring out which transaction manager to use.
* Remove recursive load in DBMSS cache
This occurs because if we do a standard transaction, the JpaTxnManager
checks to see if we should be doing backups, which involves loading the
migration state schedule (causing the recursion). When starting the
transaction to load the schedule, we should explicitly
transactWithoutBackup so there's no need to check.
This wasn't hit in tests because we previously manually set the
replication to not occur in the JpaTransactionManagerExtension -- we
remove that and related setters.
* Generate string to uniquely identify a SqlEntity
Add a method to SqlEntity that returns a string built from the entity's
primary key(s). This string can be used in logging.
* Add the domain DNS refresh request time field to the DB schema
This isn't used yet, but it will eventually be the replacement for the dns-pull
task queue once we get further in the migration.
* Remove index
* Store DatabaseMigrationSchedule in SQL instead of Datastore
This requires messing around with some of the JPA unit test rule
creation since it requires saving / retrieving the schedule pretty much
always (which itself includes the hstore extension).
This performs a direct load-by-key (the most efficient Datastore operation),
rather than attempting to load all entities by type using an ancestor query. The
existing implementation is possibly more error-prone as well, and might be
responsible for the "cross-group transaction need to be explicitly specified"
error we're seeing.
* Remove PremiumList from Datastore schema
* Remove commented out code
* Change lastUpdateTime to creationTimestamp
* Remove extra file
* Remove currency unit from input data to parse
* Revert extra file
* Check currency in parse
* Create all PremiumEntries before saving them in bulk
* small fixes
* Fix merge conflict
There was a subtle issue that we encountered in sandbox when using one
transaction per file that was difficult to replicate. Basically,
1. Save a domain with dsData
2. Save the domain without dsData
3. Save the domain with the same dsData as step 1
4. Delete literally any object
If one performs steps 2-4 in the same transaction, Hibernate will throw
an exception (cascade re-saving a cascade-deleted object). Note that
step 4 is in fact necessary to reproduce the issue, yay Hibernate.
We will test this and if one transaction per transaction is too slow,
we'll figure out ways to reduce the number of SQL transactions.
After the migration to the new GCS API it becomes apparent that the
BlobId.of() method needs to take the bucket name (without any trailing
directories) as the first argument. I did a search on all occurrences of
"BlobId.of" in the code base and verified that it is only in the ICANN
reporting job that the API was misused.
<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1265)
<!-- Reviewable:end -->
This PR re-implements most of the logic in the RdeStagingReducer, with
the exception of the last enqueue operations, due to the fact that the
task queue API is not available outside of App Engine SDK. This part
will come in a separate PR.
Another deviation from the reducer is that we forwent the lock -- it is
difficult do it across different beam transforms. Instead we write each
report to a different folder according to its unique beam job name. When
enqueueing the publish tasks we will then pass the folder prefix as a
URL parameter.
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/google/nomulus/1249)
<!-- Reviewable:end -->
This is instead of the current configuration parameter.
In addition, this adds some helpers to DatabaseHelper to make the
transitions easier, since we more frequently need to alter + reset the
schedule.
If we use the transaction manager methods, JpaTransactionManagerImpl
will attempt to detach the EppResource in question that we're loading --
this fails because that entity has been saved in the same transaction
already. We don't need detaching during these methods (it's just for
resource population) so we can use the raw loads to get around it.
* Remove KmsSecret model entities
Now that we have been using the SecretManager for almost a month now,
remove the KmsSecret and KmsSecretRevision entities from Java code base.
A follow-up PR will drop the relevant tables in the schema.
Also removed a few unused classes in the beam package.
* Fix runtime issues with commit-log-to-SQL replay
- We now use a more intelligent prefix to narrow the listObjects search
space in GCS. Otherwise, we're returning >30k objects which can take
roughly 50 seconds. This results in a listObjects time of 1-3 seconds.
- We now search hour by hour to efficiently make use of the prefixing.
Basically, we keep searching for new files until we hit the current time
or until we hit the overall replay timeout.
- Dry-run only prints out the first hour's worth of files