Lengthen the publishDnsUpdates maximum lock duration

Having a short maximum lock duration doesn't affect the lock performance - since the lock is only in use while the command is running anyway (which doesn't depend on the maximum lock duration). It only affects the behavior if the command running time is longer than the maximum lock duration. If that happens - the command will fail, retry, and fail again forever. This may be a left-over from the old code, where the publishDnsUpdates itself read the domains from the pull queue and published them - which would mean that killing the command doesn't undo the work done. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=182255446
2025-07-15 23:45:14 +02:00 · 2018-01-17 12:49:44 -08:00 · 2018-01-17 12:49:44 -08:00 · f22a42cd42
commit f22a42cd42
parent bf321ca044
1 changed files with 11 additions and 7 deletions
--- a/java/google/registry/config/RegistryConfig.java
+++ b/java/google/registry/config/RegistryConfig.java
@ -274,14 +274,18 @@ public final class RegistryConfig {
    @Config("dnsWriteLockTimeout")
    public static Duration provideDnsWriteLockTimeout() {
      /*
-       * Optimally, we would set this to a little less than the length of the DNS refresh cycle,
+       * This is the maximum lock duration for publishing the DNS updates, meaning it should allow
-       * since otherwise, a new PublishDnsUpdatesAction could get kicked off before the current one
+       * the various DnsWriters to publish and commit an entire batch (with a maximum number of
-       * has finished, which will try and fail to acquire the lock. However, it is more important
+       * items set by provideDnsTldUpdateBatchSize).
-       * that it be greater than the DNS write timeout, so that if that timeout occurs, it will be
+       *
-       * cleaned up gracefully, rather than having the lock time out. So we have to live with the
+       * Any update that takes longer than this timeout will be killed and retried from scratch.
-       * possible lock failures.
+       * Hence, a timeout that's too short can result in batches that retry over and over again,
       * failing forever.
       *
       * If there are lock contention issues, they should be solved by changing the batch sizes
       * or the cron job rate, NOT by making this value smaller.
       */
-      return Duration.standardSeconds(75);
+      return Duration.standardMinutes(3);
    }
    /**