"keepTasks" is a flag that prevents ReadDnsQueueAction from removing dns-update
tasks from the dns-pull queue, while still launching PublishDnsUpdates tasks to
update the DNS (meaning these tasks will be updated again in the next
ReadDnsQueueAction).
I'm not sure what's the purpose of this flag, but given we now allow multiple
writers (meaning we can already publish the same DNS multiple times) and given
that we can now recover from a bad writer (if a writer doesn't belong to a TLD,
we put the dns-updates queued for that writer back into the dns-pull queue) - I
suspect we don't need it anymore.
Alternative considered: changing this to a "dryRun" flag that won't actually
launch PublishDnsUpdates tasks, but will log which tasks it would have
launched. Decided against it because we will still need to "own" any task for a
significant amount of time if there are many (tens of thousands) tasks in the
queue. Hence a "dryRun" will still affect any actual runs for some time.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183997187
This allows grouping metrics based on the DnsWriter. We can already group by
the TLD, but since a TLD can have multiple writers, and since different writers
perform very differently from one another, it could be important to group by
writer as well.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182255398
This is done first and formost to stop "empty" commits that cause errors in
publishDnsUpdates. The reason being that the Cloud DNS api fails when there are
no updates at all in a change.
Allowing this is a requirement for the writer to be idempotent - if we delete a
domain, then run the writer to delete it again - we'll get 0 additions and 0
deletions which fails.
This isn't theoretical either - we've seen it happen, causing a
publishDnsUpdates to fail over and over again.
While fixing this, we also remove all RRS that are common between additions and
deletions. This is just an optimization and shouldn't affect behavior.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=179525218
This is in preparation for running the automatic refactoring script that
will replace all ExpectedExceptions with use of JUnit 4.13's assertThrows/
expectThrows.
Note that I have recorded the callsites of assertions about EppExceptions
being marshallable and will edit those specific assertions back in after
running the automatic refactoring script (which do not understand these).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178812403
The only remaining methods on ExceptionRule after this are methods that
also exist on ExpectedException, which will allow us to, in the next CL,
swap out the one for the other and then run the automated refactoring to
turn it all into assertThrows/expectThrows.
Note that there were some assertions about root causes that couldn't
easily be turned into ExpectedException invocations, so I simply
converted them directly to usages of assertThrows/expectThrows.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178623431
Before pushing an update to Cloud DNS, the CloudDnsWriter needs to read all the domain RRSs from Cloud DNS one by one to know what to delete.
Doing so sequentially results in update times that are too long (approx 200ms per domain, which is 20 seconds per batch of 100) severely limiting our QPS.
This CL uses Concurrent threading to do the Cloud DNS queries in parallel. Unfortunately, my preferred method (Set.parallelStream) doesn't work on App Engine :(
This reduces the per-item time from 200ms to 80ms, which can be further reduced to 50ms if we remove the rate limiter (currently set to 20 per second).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178126877
Currently, if for some reason publishDnsUpdates gets a request to publish
domains to a DnsWriter that doesn't belong to said domain - it logs a warning
but published anyway.
This can happen when Writers are changed (swapped for a different writer)
leaving update commands "stuck" with the wrong writer.
Normally you'd expect these update commands to just publish their data and be
on their way. However, if the update fails for some reason (likely - if the
Writer change happened BECAUSE the updates are failing) then the same
publishDnsUpdate command will continue to run forever.
This CL changes the behavior for "publish to wrong DnsWriter" to instead
requeue the batched domains / hosts back to the Dns-pull queue, allowing them
to be re-batched (and hence published) with the correct DnsWriter(s). This
re-batching will take place in ReadDnsQueueAction.java
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177863076
Last commit did not pick up all the changes because MOE incorrectly attributed some changes to the wrong commit. This commit should reconcile these. Also picked up some changes to how hamcrest library is depended upon in BUILD file, which should have been included in previous commits.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177637931
This removes some qualifiers that aren't necessary (e.g. public/abstract on interfaces, private on enum constructors, final on private methods, static on nested interfaces/enums), uses Java 8 lambdas and features where that's an improvement
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177182945
Adding the following metrics:
- how long does an update take, per TLD
- number of domains published, per TLD
- number of hosts published, per TLD
All are distributions.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172933834
Convert periods to hyphens in multi-part TLDs when using them as a zone name
(cloud-dns doesn't allow periods in zone names).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172007089
This was a surprisingly involved change. Some of the difficulties included
java.util.Optional purposely not being Serializable (so I had to move a
few Optionals in mapreduce classes to @Nullable) and having to add the Truth
Java8 extension library for assertion support.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171863777
The existing CloudDnsWriter code uses ImmutableMap.Builder to construct the
map of DNS records to update. This has been seen to fail on alpha, presumably
in a cases where host records and domain records produce duplicate updates for
a host.
Convert the Builder to a HashMap, allowing us to safely overwrite existing
records in the case of duplicates.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170103421
Right now - if there's an error during DnsWriter.publish*, all the publish from
before that error will be committed, while all the publish after that error
will not.
More than that - in some writers partial publishes can be committed, depending
on implementation.
This defines a new contract that publish* are only committed when .commit is
called. That way any error will simply mean no publish is committed.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=165708063
Attending to this old bug will improve our ability to perform zone comparisons between Datastore and the DNS provider. Right now, zone comparison finds some bogus differences, because the TTL we send to the DNS subsystem doesn't match the TTL we use when generating our local dump files.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164635557
This completes the data/functionality migration for multiple DNS writers.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163835077
This will make DNS issues easier to debug retroactively as we will be
able to determine, by looking at the logs, if the queue size was growing
unbounded.
Also adds some logging helpers to allow programmatically choosing the level
of logging.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163123783
This is written in such a way that it can safely handle task items in the
old format so long as the DNS writer to use for the given TLD is unambiguous
(which it is for now, until we allow multiple DNS writers to be configured).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162293412
After this point all data is migrated to use the new canonical
plural version, and subsequent code changes can be made that use
multiple writers.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161673486
This fixes TaskQueueHelper methods and MatchableTaskInfo so that the .param() matching works for pull queues, by parsing the payload for URL-encoded parameters more liberally. As such, it updates all the places where formerly we were hacking around this by manually constructing the expected payloads and using TaskMatcher.payload() instead.
It also adds a TaskQueueHelper.assertTasksEnqueued() overload that accepts an Iterable<TaskStateInfo> so that you can cleanly assert that a queue contains the same tasks that were returned via a previous call to getQueueInfo("queue").getTaskInfo().
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=156604901
This is the final preparatory step necessary in order to load and load
configuration from YAML in a static context and then provide it either via
Dagger (using ConfigModule) or through RegistryConfig's existing static
functions.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=143819983
We're now using java_import_external instead of maven_jar. This allows
us to specify the relationships between jars, thereby allowing us to
eliminate scores of vendor BUILD files that did nothing but re-export
@foo//jar targets, thus addressing the concerns of djhworld on Hacker
News: https://news.ycombinator.com/item?id=12738072
We now have redundant failover mirrors, which is a feature I added to
Bazel 0.4.2 in ed7ced0018
A new standard naming convention is now being used for all Maven repos.
Those names are calculated from the group_artifact name using the
following algorithm that eliminates redundancy:
https://gist.github.com/jart/41bfd977b913c2301627162f1c038e55
The JSR330 dep has been removed from java targets if they also depend
on Dagger, since Dagger always exports JSR330.
Annotation processor dependencies should now be leaner and meaner, by
more appropriately managing what needs to be on the classpath at
runtime. This should trim down the production jar by >1MB. As it stands
currently in the open source world:
- backend_jar_deploy.jar: 50MB
- frontend_jar_deploy.jar: 30MB
- tools_jar_deploy.jar: 45MB
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=143487929
This makes the usage of DnsQueue.create() safer, since we're no longer
forced to hardcode a copy of the @Config("dnsWriteLockTimeout") value
within that method. That value is only needed for leaseTasks(), which
is only called in one place (ReadDnsQueueAction), so we can just pass
it in from that callsite.
Also removes an unused overload of leaseTasks() that allowed specifying
a tag, which is a feature we no longer need.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=136162491
aka regexing for fun and profit.
This also makes sure that there are no statements after the
throwing statement, since these would be dead code. There
were a surprising number of places with assertions after
the throw, and none of these are actually triggered in tests
ever. When I found these, I replaced them with try/catch/rethrow
which makes the assertions actually happen:
before:
// This is the ExceptionRule that checks EppException marshaling
thrown.expect(FooException.class);
doThrowingThing();
assertSomething(); // Dead code!
after:
try {
doThrowingThing();
assertWithMessage("...").fail();
} catch (FooException e) {
assertSomething();
// For EppExceptions:
assertAboutEppExceptins().that(e).marshalsToXml();
}
To make this work, I added EppExceptionSubject.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=135793407
They were taking a DateTime "now", which would seem like it would be the time of
when the resource was deleted, but it was actually the time by which the
resource was deleted, with the actual deletion time being hardcoded to a day
prior. The confusion was evident because a fair number of tests were passing
the wrong thing. I renamed the parameter "deletionTime" to make it exactly
clear what it's doing and fixed up some callsites where necessary.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=134818032
This reworks the logic in RefreshDnsAction by factoring out a few
helper methods so the core logic is simpler and more straightforward.
Also added a couple tests to DnsInjectionTest that seemed worth having
for symmetry.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=134110706
This change replaces all Ref objects in the code with Key objects. These are
stored in datastore as the same object (raw datastore keys), so this is not
a model change.
Our best practices doc says to use Keys not Refs because:
* The .get() method obscures what's actually going on
- Much harder to visually audit the code for datastore loads
- Hard to distinguish Ref<T> get()'s from Optional get()'s and Supplier get()'s
* Implicit ofy().load() offers much less control
- Antipattern for ultimate goal of making Ofy injectable
- Can't control cache use or batch loading without making ofy() explicit anyway
* Serialization behavior is surprising and could be quite dangerous/incorrect
- Can lead to serialization errors. If it actually worked "as intended",
it would lead to a Ref<> on a serialized object being replaced upon
deserialization with a stale copy of the old value, which could potentially
break all kinds of transactional expectations
* Having both Ref<T> and Key<T> introduces extra boilerplate everywhere
- E.g. helper methods all need to have Ref and Key overloads, or you need to
call .key() to get the Key<T> for every Ref<T> you want to pass in
- Creating a Ref<T> is more cumbersome, since it doesn't have all the create()
overloads that Key<T> has, only create(Key<T>) and create(Entity) - no way to
create directly from kind+ID/name, raw Key, websafe key string, etc.
(Note that Refs are treated specially by Objectify's @Load method and Keys are not;
we don't use that feature, but it is the one advantage Refs have over Keys.)
The direct impetus for this change is that I am trying to audit our use of memcache,
and the implicit .get() calls to datastore were making that very hard.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=131965491