Attempting to run DeleteOldCommitLogs in prod resulted in a lot of DatastoreTimeoutException errors. The assumption is that attempting to load so many CommitLogManifests (over 200 million of them), when each one has a slight possibility of failure, has a very high probability of error.
The shard aborts after 20 of these errors, and by eliminating as many loads as possible and retrying the remaining loads inside a transaction we are effectively eliminating any exceptions "leaking" out to the mapreduce framework, which will hopefully keep us bellow 20. At least, that's our best guess currently as to why the mapreduce fails.
EppResources are loaded in the map stage to get the revisions, and CommitLogManifests are only loaded in the reduce stage for sanity check so we don't accidentally delete resources we need in prod. Both of these are wrapped in transactNew to make sure they retry individually.
The only "load" not done inside a transaction is the EppResourceIndex, but there's no getting around that without rewriting the EppResourceInputs.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164176764
When the registrar console code determines that a user has not logged in, it redirects to a login page. But when authenticating as an internal request (which should never happen), the redirection code encountered an exception, resulting in a 500 error.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163867018
This completes the data/functionality migration for multiple DNS writers.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163835077
It was buggy (didn't work) and was never actually used.
Why never actually used: for it to be used executeWithLock has to be called
with different requesters on the same lockId. That never happend in the code.
How it was buggy: Logically, the queue is deleted on release of the lock (meaning it was
meaningless the only time it mattered - when the lock isn't taken). In
addition, a different bug meant that having items in the queue prevented the
lock from being released forcing all other tasks to have to wait for lock
timeout even if the task that acquired the lock is long done.
Alternative: fix the queue. This would mean we don't want to delete the lock on release (since we want to keep the queue). Instead, we resave the same lock with expiration date being START_OF_TIME. In addition - we need to fix the .equals used to determine if the lock the same as the acquired lock - instead use some isSame function that ignores the queue.
Note: the queue is dangerous! An item (calling class / action) in the first place of a queue means no other calling class can get that lock. Everything is waiting for the first calling class to be re-run - but that might take a long time (depending on that action's rerun policy) and even might never happen (if for some reason that action decided it was no longer needed without acquiring the lock) - causing all other actions to stall forever!
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163705463
This will make DNS issues easier to debug retroactively as we will be
able to determine, by looking at the logs, if the queue size was growing
unbounded.
Also adds some logging helpers to allow programmatically choosing the level
of logging.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163123783
This makes the code more understandable from callsites, and also forces
users of this function to deal with the situation where the registrar
with a given client ID might not be present (it was previously silently
NPEing from some of the callsites).
This also adds a test helper method loadRegistrar(clientId) that retains
the old functionality for terseness in tests. It also fixes some instances
of using the load method with the wrong cachedness -- some uses in high-
traffic situations (WHOIS) that should have caching, but also low-traffic
reporting that don't benefit from caching so might as well always be
current.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162990468
"The passage of time" caused the test to start failing because the test data
given by ICANN includes certificates that expire on 2017.
Using a fake clock to make sure the "now" date is always in the valid
certificate range solves this issue.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162987171
This also brings the SQL template parameters in-line with the anticipated Bigquery dataset format, and switches from DateTime to the more appropriate LocalDate (since we only need monthly granularity).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162747692
Note that even though the nomulus command line tool now supports multiple
DNS writers for all subcommands, this still won't work quite yet because
the DNS task queue format migration from [] is still in progress.
After next week's push that migration will be complete and we can remove
the final restriction against only having one DNS writer per TLD.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162490399
This is written in such a way that it can safely handle task items in the
old format so long as the DNS writer to use for the given TLD is unambiguous
(which it is for now, until we allow multiple DNS writers to be configured).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162293412
We want to be safer and more explicit about the authentication needed by the many actions that exist.
As such, we make the 'auth' parameter required in @Action (so it's always clear who can run a specific action) and we replace the @Auth with an enum so that only pre-approved configurations that are aptly named and documented can be used.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162210306
Also updates Truth version to 0.34 where the replacement method was added.
More information: []
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161970305
This allows us to have a modular view of all tables used in activity reporting, to facilitate generating reports in BigQuery.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161849007
This standardizes use of annotations/inheritance/formatting across
tests, to make the code more legible and consistent.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161810734
This paves the way for [] which expands the set of classes Blaze will check for possible test methods that are not properly annotated.
For more details and FAQs please see: []
Tested:
TAP --sample for global presubmit queue
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161795590
After this point all data is migrated to use the new canonical
plural version, and subsequent code changes can be made that use
multiple writers.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161673486
This makes it take a lot less time to run (roughly a 10X speedup).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161666391
Now instead of deleting "all logs older than X", we delete "all logs older than
X that don't have any EppResource.getRevision()" pointing to them.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161455827
This is the first step in a multi-step data migration to allow multiple
DNS writers per TLD. The overall process looks like this:
1. Add a plural DNS writers field with backfill (this commit).
2. Deploy it.
3. Run the ResaveEnvironmentEntitiesCommand to populate this new field
on all entities.
4. Update the code to use the new field everywhere.
5. Deploy it.
6. Delete the now-unreferenced, old deprecated singular value field.
This process is rollback-safe.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161253436
The billing account map will be serialized in the following format:
{currency1=id1, currency2=id2, ...}
In order for the output to be deterministic, the billing account map is stored as a sorted map.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161075814
This is the first step in moving the current []cron-Python reporting scripts
into App Engine, as an official part of the Nomulus package. This copies the
structure of RDE uploads, with a few changes specific to monthly reporting.
I've left some TODOs related to actually testing it on the ICANN endpoint, as we're still not sure how files to be uploaded will be staged, and whether we can actually ping their endpoint on valid ports (80 or 443).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160408703
This will allow us to migrate one TLD at a time by refreshing all zones
on the specified TLD after dual-writing is enabled.
Note that the TLDs parameter is required, which seems reasonable given
that almost all imagined use cases would be on a by-TLD basis.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160294546
I'm moving it out of the scrap folder too because there's nothing else
in there and we do want to retain this indefinitely because it's a useful
tool for performing DNS writer migrations.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160168902
Up to now, our search wildcard rules have been that there must be an initial string of at least two characters. If a wildcard is present after that, it can optionally be followed by a suffix specifying the TLD (for domains) or domain (for nameservers). So domain queries can look like:
example.tld
ex*
ex*.tld
and nameserver queries can look like:
ns1.example.tld
ns*.example.tld
ns*
But you can't do a domain query for *.tld, nor a nameserver query for *.example.tld. It would be nice to support such queries, and the presence of a valid TLD or domain makes them relatively efficient. This CL relaxes the restrictions to allow wildcards with no initial string if the suffix is present. For nameservers, the suffix must be a valid domain in the system, to avoid having to loop through all nameservers.
A side effect of the changes is to fix a shortcoming in the logic which caused wildcard nameserver searches to fail if the specified domain suffix referred to an external domain.
Entity searches are not affected, since they do not support suffixes.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=159856563
The affected actions have been changed to check that the user is logged in by [] so this attribute is no longer needed.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=159572365
We are going to remove the requireLogin attribute from the action attribute, because it is specific to the UserService API. This is used by four actions:
ConsoleUIAction
RegistrarSettingsAction
RegistrarPaymentSetupAction
RegistrarPaymentAction
Instead, these four actions will now check the login status directly.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=159562335
Now that the registration period has been added to DomainApplication, we
can remove this @OnLoad that was populating it for objects that were
missing the period.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=159464438
When doing update_registrar, it is now possible to only specify the currencies and the account ids that need updating.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=159262119
The --superuser command in the nomulus command-line tool should be
bypassing checks on whether the passed-in registrar client ID has access
to the TLD in question, but currently it is not.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=158974462
These shouldn't ever be null, but we have some bad data in production
for prober TLDs left over from the Registry 2.0 transition. Ignoring
null values here is required to finish cleanup for this old data, which
currently cannot even be deleted because it's throwing an NPE when
trying to update these values.
This commit will be reverted after the bad data is cleaned up, likely
sometime next week.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=158546840
PDT testing revealed a couple ways in which our WHOIS output was non-compliant. First, the Consistent Labeling & Display policy dictates that the contact IDs must be ROIDs. See rule 11 in https://www.icann.org/resources/pages/rdds-labeling-policy-2017-02-01-en. Second, PDT tests expect that a WHOIS response will treat missing values either by omitting the line entirely, or by including the line with a blank value, but not both. So this is legal:
Phone Number: 123-4567
Phone Number Ext:
Fax Number: 123-4568
Fax Number Ext:
and this is legal:
Phone Number: 123-4567
Fax Number: 123-4568
but this is not:
Phone Number: 123-4567
Phone Number Ext:
Fax Number: 123-4568
In the last example, one extension line is present with a blank value, while the other extension line is omitted. We cannot do both. Therefore, we should update our code to omit lines with no value. Since we can't guarantee that we will always emit all lines that the parse might expect to see, it is safe to use the policy of omitting lines with no value.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=158184150
Memcache is already off but now it's not in the code anymore.
This includes removing domain creation failfast, since that is actually
slower now than just running the flow - all you gain is a non-transactional
read over a transactional read, but the cost is that you always pay that
read, which is going to drive up latency.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=158183506
I think this comment was meant to be the justification for not using "localhost" (aka a hostname) in the URLs, because jsch would mangle it. However, we already cut over to using "localhost" in [] to avoid a dependency on IPv4, and it's been fine. So this comment no longer makes sense.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=158063880
Changed [] to use v1 instead of v1beta1, and replaced v1beta1 with v1 in all the java files.
If there is special build rules for open-source etc. that also need to be updated, or non "TAP-able" tests that need to be run, please check and see if they are OK.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=157895888
The command was set up such that an update without any contact types specified would clear out the list, instead of leaving them unchanged, as it should.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=157766429
[] opened up the settings on RdeStaging, in order to make it usable by the nomulus tool. But in retrospect, we think that all we needed to do was support the POST method, not loosen the auth settings, since nomulus invokes RdeStaging via a task queue. Removing the looser auth settings will bring this action into line with other backend actions.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=157262595
This brings the affected actions into line with the settings on other similar actions.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=157259842
Convert RestoreCommitLogsCommandTest use of generics and mockito to a form
that works with the kokoro build:
- Replace ImmutableMap<String, Object> with ImmutableMap<String, ?>.
- Replace any() as a matcher for MediaType with an "eq()" matcher.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=157148910