Commit graph

1052 commits

Author SHA1 Message Date
guyben
cf94d69a3e Map over Key instead of actual instances when deleting old commit logs
Attempting to run DeleteOldCommitLogs in prod resulted in a lot of DatastoreTimeoutException errors. The assumption is that attempting to load so many CommitLogManifests (over 200 million of them), when each one has a slight possibility of failure, has a very high probability of error.

The shard aborts after 20 of these errors, and by eliminating as many loads as possible and retrying the remaining loads inside a transaction we are effectively eliminating any exceptions "leaking" out to the mapreduce framework, which will hopefully keep us bellow 20. At least, that's our best guess currently as to why the mapreduce fails.

EppResources are loaded in the map stage to get the revisions, and CommitLogManifests are only loaded in the reduce stage for sanity check so we don't accidentally delete resources we need in prod. Both of these are wrapped in transactNew to make sure they retry individually.

The only "load" not done inside a transaction is the EppResourceIndex, but there's no getting around that without rewriting the EppResourceInputs.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164176764
2017-08-29 15:40:41 -04:00
mountford
2f238a2c77 Reduce number of authentication/authorization log statements
The auth logging has been useful, but it now generates a sizeable percentage of all logging, because it spits out three to five lines for every request in the system. This CL reduces that to two to three. We may eventually want to reduce it further, but this is a good start.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164146182
2017-08-29 15:39:10 -04:00
mountford
f623d53e73 Remove invalid comment and add temp variable
It was not a problem after all to handle multiple scopes. Also added a temp variable to avoid making the same array conversion over and over.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164002903
2017-08-29 15:34:49 -04:00
mountford
5fefa8906d Fix bug which caused exceptions when attempting to redirect to the console login page
When the registrar console code determines that a user has not logged in, it redirects to a login page. But when authenticating as an internal request (which should never happen), the redirection code encountered an exception, resulting in a 500 error.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163867018
2017-08-01 17:11:54 -04:00
mcilwain
2a29ada032 Allow multiple DNS writers on TLDs
This completes the data/functionality migration for multiple DNS writers.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163835077
2017-08-01 17:10:33 -04:00
mountford
05d22a2556 Add retry to claims list load
A NullPointerException reported via StackDriver appears to stem from trying to load the claims list right at the moment it was being updated. Since the update only happens once every 12 hours, retrying the load once should fix the problem, if this is really the cause.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163732624
2017-08-01 17:09:10 -04:00
guyben
aee4f7acc2 Remove queueing from Lock
It was buggy (didn't work) and was never actually used.

Why never actually used: for it to be used executeWithLock has to be called
with different requesters on the same lockId. That never happend in the code.

How it was buggy: Logically, the queue is deleted on release of the lock (meaning it was
meaningless the only time it mattered - when the lock isn't taken). In
addition, a different bug meant that having items in the queue prevented the
lock from being released forcing all other tasks to have to wait for lock
timeout even if the task that acquired the lock is long done.

Alternative: fix the queue. This would mean we don't want to delete the lock on release (since we want to keep the queue). Instead, we resave the same lock with expiration date being START_OF_TIME. In addition - we need to fix the .equals used to determine if the lock the same as the acquired lock - instead use some isSame function that ignores the queue.

Note: the queue is dangerous! An item (calling class / action) in the first place of a queue means no other calling class can get that lock. Everything is waiting for the first calling class to be re-run - but that might take a long time (depending on that action's rerun policy) and even might never happen (if for some reason that action decided it was no longer needed without acquiring the lock) - causing all other actions to stall forever!

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163705463
2017-08-01 17:06:20 -04:00
guyben
fa858ac5cf Remove unneeded "requester" from publishDnsUpdates locking
This is a quick fix we can hopefully get out fast before fixing the underlying problem.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163485468
2017-08-01 17:04:56 -04:00
larryruili
d2cd576796 Add standardSQL views to Bigquery Datastore snapshots
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163124895
2017-08-01 17:03:28 -04:00
mcilwain
8869814e96 Add logging statement for # of tasks in DNS queue
This will make DNS issues easier to debug retroactively as we will be
able to determine, by looking at the logs, if the queue size was growing
unbounded.

Also adds some logging helpers to allow programmatically choosing the level
of logging.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163123783
2017-08-01 17:02:00 -04:00
mcilwain
1a1fdfd531 Improve DNS logging messages for greater searchability
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163071619
2017-08-01 17:00:36 -04:00
mcilwain
d536cef20f Make Registrar load methods return Optionals instead of Nullables
This makes the code more understandable from callsites, and also forces
users of this function to deal with the situation where the registrar
with a given client ID might not be present (it was previously silently
NPEing from some of the callsites).

This also adds a test helper method loadRegistrar(clientId) that retains
the old functionality for terseness in tests. It also fixes some instances
of using the load method with the wrong cachedness -- some uses in high-
traffic situations (WHOIS) that should have caching, but also low-traffic
reporting that don't benefit from caching so might as well always be
current.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162990468
2017-08-01 16:58:59 -04:00
larryruili
33eb5f1c87 Upgrade activity reporting queries to StandardSQL
This also brings the SQL template parameters in-line with the anticipated Bigquery dataset format, and switches from DateTime to the more appropriate LocalDate (since we only need monthly granularity).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162747692
2017-08-01 16:56:12 -04:00
mcilwain
8a921f08ed Fix bad formatting/line breaks in Registry entity
This file was particularly bad for some reason.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162623626
2017-08-01 16:54:52 -04:00
mcilwain
d3e9ebad16 Remove deprecated singular DNS writer field and update tooling
Note that even though the nomulus command line tool now supports multiple
DNS writers for all subcommands, this still won't work quite yet because
the DNS task queue format migration from [] is still in progress.
After next week's push that migration will be complete and we can remove
the final restriction against only having one DNS writer per TLD.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162490399
2017-08-01 16:50:49 -04:00
guyben
8ff1102223 Add the ability to get_keyring_secret the public key from key pairs
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162399452
2017-08-01 16:49:29 -04:00
mcilwain
f771b32ece Fix checkApiServletClientId placeholder in production config sample
It should not be multiline, as registrar client ids are single short-ish identifiers with no spaces allowed. There's no way for them to span multiple lines.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162389442
2017-08-01 16:48:09 -04:00
mcilwain
d9613cf69a Add sanity check on commit log deletion
I know the query that finds commit logs already should ignore commit logs
that are too young, but this adds an explicit sanity check for safety's
sake, so we don't have to depend solely on an indexed query for safety.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162386413
2017-08-01 16:46:50 -04:00
guyben
268abbc383 Add option for dry run
The dry run does all the steps except the deletion. All the counters will
return the same values they would have returned on an actual run.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162379609
2017-08-01 16:45:28 -04:00
bbilbo
700148ae45 Add missing space between {$productName} and 'and'
Soy doesn't automatically add a space after a macro if it is the last element in
the line (https://screenshot.[].com/mTPYqE086Qk).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162376247
2017-08-01 16:44:06 -04:00
mcilwain
b671dd6451 Make dry run parameter documentation more understandable
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162373834
2017-08-01 16:42:42 -04:00
guyben
882db28ee0 Set the number of map shards to 20
This change is motivated by the sandbox run where we saw the backend instances overwhelmed by the 100 default shards to the point where they couldn't even answer a simple status request.

Production has 50 backend instances, so 20 will leave a lot of spare for other tasks.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162357857
2017-08-01 16:41:19 -04:00
mcilwain
4a921973ea Add capability to sync DNS using multiple writers if configured
This is written in such a way that it can safely handle task items in the
old format so long as the DNS writer to use for the given TLD is unambiguous
(which it is for now, until we allow multiple DNS writers to be configured).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162293412
2017-08-01 16:38:36 -04:00
guyben
e224a67eda Change @Auth to an AutoValue, and created a set of predefined Auths
We want to be safer and more explicit about the authentication needed by the many actions that exist.

As such, we make the 'auth' parameter required in @Action (so it's always clear who can run a specific action) and we replace the @Auth with an enum so that only pre-approved configurations that are aptly named and documented can be used.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162210306
2017-08-01 16:33:10 -04:00
Ben McIlwain
5966d8077b Migrate TestVerb.withFailureMessage to use withMessage instead
Also updates Truth version to 0.34 where the replacement method was added.

More information: []

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161970305
2017-08-01 16:30:24 -04:00
guyben
bfde7dac0b Add info printouts for lock acquisition / release
Trying to debug lock acquisition issues with RdeStaging.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161966189
2017-08-01 16:29:01 -04:00
bbilbo
7d7048ac12 Declare types in Optional.absent() usage
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161865295
2017-08-01 16:26:18 -04:00
bbilbo
9688638c75 Use History Entry type for flows in VerifyOteServlet
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161855429
2017-08-01 16:24:55 -04:00
larryruili
4887811fc3 Add activity reporting SQL query generation code
This allows us to have a modular view of all tables used in activity reporting, to facilitate generating reports in BigQuery.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161849007
2017-08-01 16:23:31 -04:00
mcilwain
37f33e5e7a Migrate plural DNS writers field to being the canonical one
After this point all data is migrated to use the new canonical
plural version, and subsequent code changes can be made that use
multiple writers.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161673486
2017-08-01 16:12:42 -04:00
mcilwain
24587491c9 Make re-save environment entities command use batching
This makes it take a lot less time to run (roughly a 10X speedup).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161666391
2017-08-01 16:10:00 -04:00
jianglai
b235565eef Fix the build on MacOS
The build on MacOS fails (https://github.com/google/nomulus/issues/67) due to different syntax for sed on BSD vs. Linux.

See this StackOverflow discussion: https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux

Also adds a newline between @SuppressWarnings annotation and the class definition.

Note that MacOS support is best-effort only.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161661181
2017-07-12 11:03:50 -04:00
mcilwain
77b8729ec6 Add example OAuth client id to production sample YAML file
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161583881
2017-07-12 11:03:50 -04:00
mountford
3372ed718a Add documentation about OAuth2 client id configuration
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161571961
2017-07-12 11:03:50 -04:00
guyben
944d7a91d1 Update DeleteOldCommitLogs to only delete unreferenced logs
Now instead of deleting "all logs older than X", we delete "all logs older than
X that don't have any EppResource.getRevision()" pointing to them.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161455827
2017-07-12 11:03:50 -04:00
Lai Jiang
e293590520 Revert "Add vim file ignore rule, and make build work on MacOS"
This reverts commit 9a53cba74d.
2017-07-11 22:04:03 -04:00
Lai Jiang
9a53cba74d Add vim file ignore rule, and make build work on MacOS 2017-07-11 22:02:38 -04:00
jianglai
a7aeb1924a Revert "Build on OS X"
Wait for proper push with cleaner solution.

This reverts commit 6469d58d98.
2017-07-11 14:45:10 -04:00
Lai Jiang
6469d58d98 Build on OS X 2017-07-11 00:20:31 -04:00
mcilwain
4d5b6845b7 Add plural DNS writers field to Registry entity
This is the first step in a multi-step data migration to allow multiple
DNS writers per TLD. The overall process looks like this:

1. Add a plural DNS writers field with backfill (this commit).
2. Deploy it.
3. Run the ResaveEnvironmentEntitiesCommand to populate this new field
   on all entities.
4. Update the code to use the new field everywhere.
5. Deploy it.
6. Delete the now-unreferenced, old deprecated singular value field.

This process is rollback-safe.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161253436
2017-07-10 11:45:13 -04:00
bbilbo
b39f368ea3 Fix typo in comment
Comment says publix suffix instead of public suffix

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161236147
2017-07-10 11:42:39 -04:00
guyben
1f25a862e6 Set KmsKeyring as the default Keyring
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161113761
2017-07-10 11:40:02 -04:00
jianglai
0013312f5c Export billing account map to registrar sheet
The billing account map will be serialized in the following format:

{currency1=id1, currency2=id2, ...}

In order for the output to be deterministic, the billing account map is stored as a sorted map.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161075814
2017-07-10 11:37:23 -04:00
mcilwain
80f8f5ac7f Add basic logging for async operation processing time
We're already storing this as a metric, but on a registry of our
scale these operations tend to only happen on a daily-ish basis,
for which seeing results in logs is easier to deal with than metrics
(and also still very light-weight).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160552291
2017-07-10 11:31:56 -04:00
larryruili
4130a8a75e Create ICANN report upload action
This is the first step in moving the current []cron-Python reporting scripts
into App Engine, as an official part of the Nomulus package. This copies the
structure of RDE uploads, with a few changes specific to monthly reporting.

I've left some TODOs related to actually testing it on the ICANN endpoint, as we're still not sure how files to be uploaded will be staged, and whether we can actually ping their endpoint on valid ports (80 or 443).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160408703
2017-07-10 11:27:58 -04:00
bbilbo
f721bda16d Update UpdateDomainCommand to use FormattingLogger
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160299234
2017-07-10 11:26:37 -04:00
mcilwain
dccc99787e Add TLDs parameter to refresh DNS action
This will allow us to migrate one TLD at a time by refreshing all zones
on the specified TLD after dual-writing is enabled.

Note that the TLDs parameter is required, which seems reasonable given
that almost all imagined use cases would be on a by-TLD basis.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160294546
2017-07-10 11:23:57 -04:00
bbilbo
bbdf9bfc38 Refactor CreateDomainCommand and add UpdateDomainCommand
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160282921
2017-07-10 11:22:39 -04:00
mcilwain
30d5d05fdf Refactor/rename refresh all DNS action
I'm moving it out of the scrap folder too because there's nothing else
in there and we do want to retain this indefinitely because it's a useful
tool for performing DNS writer migrations.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=160168902
2017-07-10 11:18:41 -04:00
mountford
5a31be12ba RDAP: Allow domain and nameserver queries with no initial string under certain circumstances
Up to now, our search wildcard rules have been that there must be an initial string of at least two characters. If a wildcard is present after that, it can optionally be followed by a suffix specifying the TLD (for domains) or domain (for nameservers). So domain queries can look like:

example.tld
ex*
ex*.tld

and nameserver queries can look like:

ns1.example.tld
ns*.example.tld
ns*

But you can't do a domain query for *.tld, nor a nameserver query for *.example.tld. It would be nice to support such queries, and the presence of a valid TLD or domain makes them relatively efficient. This CL relaxes the restrictions to allow wildcards with no initial string if the suffix is present. For nameservers, the suffix must be a valid domain in the system, to avoid having to loop through all nameservers.

A side effect of the changes is to fix a shortcoming in the logic which caused wildcard nameserver searches to fail if the specified domain suffix referred to an external domain.

Entity searches are not affected, since they do not support suffixes.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=159856563
2017-07-10 11:16:03 -04:00