Map over Key instead of actual instances when deleting old commit logs

Attempting to run DeleteOldCommitLogs in prod resulted in a lot of DatastoreTimeoutException errors. The assumption is that attempting to load so many CommitLogManifests (over 200 million of them), when each one has a slight possibility of failure, has a very high probability of error.

The shard aborts after 20 of these errors, and by eliminating as many loads as possible and retrying the remaining loads inside a transaction we are effectively eliminating any exceptions "leaking" out to the mapreduce framework, which will hopefully keep us bellow 20. At least, that's our best guess currently as to why the mapreduce fails.

EppResources are loaded in the map stage to get the revisions, and CommitLogManifests are only loaded in the reduce stage for sanity check so we don't accidentally delete resources we need in prod. Both of these are wrapped in transactNew to make sure they retry individually.

The only "load" not done inside a transaction is the EppResourceIndex, but there's no getting around that without rewriting the EppResourceInputs.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164176764
This commit is contained in:
guyben 2017-07-18 13:54:37 -04:00 committed by Ben McIlwain
parent 2f238a2c77
commit cf94d69a3e
5 changed files with 97 additions and 60 deletions

View file

@ -25,7 +25,7 @@ import java.util.List;
import org.joda.time.DateTime;
/** Base class for {@link Input} classes that map over {@link CommitLogManifest}. */
public class CommitLogManifestInput extends Input<CommitLogManifest> {
public class CommitLogManifestInput extends Input<Key<CommitLogManifest>> {
private static final long serialVersionUID = 2043552272352286428L;
@ -41,15 +41,16 @@ public class CommitLogManifestInput extends Input<CommitLogManifest> {
}
@Override
public List<InputReader<CommitLogManifest>> createReaders() {
ImmutableList.Builder<InputReader<CommitLogManifest>> readers = new ImmutableList.Builder<>();
public List<InputReader<Key<CommitLogManifest>>> createReaders() {
ImmutableList.Builder<InputReader<Key<CommitLogManifest>>> readers =
new ImmutableList.Builder<>();
for (Key<CommitLogBucket> bucketKey : CommitLogBucket.getAllBucketKeys()) {
readers.add(bucketToReader(bucketKey));
}
return readers.build();
}
private InputReader<CommitLogManifest> bucketToReader(Key<CommitLogBucket> bucketKey) {
private InputReader<Key<CommitLogManifest>> bucketToReader(Key<CommitLogBucket> bucketKey) {
return new CommitLogManifestReader(bucketKey, olderThan);
}
}

View file

@ -28,7 +28,7 @@ import java.util.NoSuchElementException;
import org.joda.time.DateTime;
/** {@link InputReader} that maps over {@link CommitLogManifest}. */
class CommitLogManifestReader extends InputReader<CommitLogManifest> {
class CommitLogManifestReader extends InputReader<Key<CommitLogManifest>> {
private static final long serialVersionUID = 5117046535590539778L;
@ -53,7 +53,7 @@ class CommitLogManifestReader extends InputReader<CommitLogManifest> {
private int total;
private int loaded;
private transient QueryResultIterator<CommitLogManifest> queryIterator;
private transient QueryResultIterator<Key<CommitLogManifest>> queryIterator;
CommitLogManifestReader(Key<CommitLogBucket> bucketKey, Optional<DateTime> olderThan) {
this.bucketKey = bucketKey;
@ -83,7 +83,7 @@ class CommitLogManifestReader extends InputReader<CommitLogManifest> {
// paused and restarted with a cursor before it would have reached the new entity.
query = query.startAt(cursor);
}
queryIterator = query.iterator();
queryIterator = query.keys().iterator();
}
/** Called occasionally alongside {@link #next}. */
@ -123,7 +123,7 @@ class CommitLogManifestReader extends InputReader<CommitLogManifest> {
* @throws NoSuchElementException if there are no more elements.
*/
@Override
public CommitLogManifest next() {
public Key<CommitLogManifest> next() {
loaded++;
try {
return queryIterator.next();