Update publish queue with practical retry params

The unlimited exponential backoff makes cascading failure a serious problem,
when encountering burst DNS load. Originally, it was exponential backoff, with min 1 sec max 1 hour.

This changes it to be linearly scaling from
30 seconds to 10 minutes. Min 30 seconds is used to avoid over-retrying due to lock contention. Max 10 minutes allows for more retries within our 1 hour SLA. Finally, we're
switching to linear scaling to increase the number of 'quick' retries for low
backoff time, before ultimately settling on the upper bound of 10 minutes (if a
task ever gets to that point, it's probably misconfigured.)

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186041553
This commit is contained in:
larryruili 2018-02-16 13:49:38 -08:00 committed by jianglai
parent 1013e047b4
commit a365b82d42

View file

@ -10,6 +10,12 @@
<name>dns-publish</name> <name>dns-publish</name>
<rate>100/s</rate> <rate>100/s</rate>
<bucket-size>100</bucket-size> <bucket-size>100</bucket-size>
<!-- 30 sec backoff increasing linearly up to 10 minutes. -->
<retry-parameters>
<min-backoff-seconds>30</min-backoff-seconds>
<max-backoff-seconds>600</max-backoff-seconds>
<max-doublings>0</max-doublings>
</retry-parameters>
</queue> </queue>
<queue> <queue>