Commit graph

72 commits

Author SHA1 Message Date
mcilwain
183dae6e80 Migrate away fully from MockitoJUnitRunner
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=192291786
2018-04-10 17:01:04 -04:00
guyben
89d8ba93f2 Remove transition code from []
The parameters were optional during the transition to allow old jobs stuck in the queue to work properly. It's been 2 months now so it's time to end the transition.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=190235532
2018-04-02 16:35:20 -04:00
larryruili
fa989e754b Add sharded DNS publishing capability
This enables sharded DNS publishing on a per-TLD basis. Instead of a TLD-wide lock, the sharded scheme locks each update on the shard number, allowing parallel writes to DNS.

We allow N (the number of shards) to be 0 or 1 for no sharding, and N > 1 for an N-way sharding scheme. Unless explicitly set, all TLDs default to a numShards of 0, so we don't have to reload all registry objects explicitly.

WARNING: This will change the lock name upon deployment for the PublishDnsAction from "<TLD> Dns Updates" to "<TLD> Dns Updates shard 0". This may cause concurrency issues if the underlying DNSWriter is not parallel-write tolerant (currently all production usages are ZonemanWriter, which is parallel-tolerant, so no issues are expected).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=187525655
2018-03-06 19:14:26 -05:00
cushon
606b470cd0 Merge JUnitBackport's expectThrows into assertThrows
More information: https://github.com/junit-team/junit5/issues/531

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=187034408
2018-03-06 18:56:15 -05:00
guyben
6e4b2bd6a8 Add metric for DNS UPDATE latency
Added:
- dns/update_latency, which measures the time since a DNS update was added to the pull queue until that updates is committed to the DnsWriter
- - It doesn't check that after being committed, it was actually published in the DNS.

- dns/publish_queue_delay, which measures how long since the initial insertion to the push queue until a publishDnsUpdate action was handled. It measures both for successes (which is what we care about) and various failures (which are important because the success for that publishDnsUpdate will be > than any of the previous failures)

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185995678
2018-02-20 15:54:15 -05:00
guyben
bba975a991 Allow over 1000 dns-updates to be handled at once
The task-queue API only allows reading 1000 tasks at a time, hence the original reason for this limit. We get over that limit by reading (and processing) items from the queue in a loop - 1000 at a time.

This is important because the 1000 dns-updates are shared among all TLDs,
meaning that a TLD with >1000 waiting updates can affect the update latency of
other TLDs.

In addition, partially fixes the bug where if there are more than 1000 updates to paused
/ non-existing TLDs, we completely block all updated to all TLDs.

By partially fixed, I mean "if we have around 1000 updates to paused TLDs, we will read them every time ReadDnsUpdates is called, ignore then, and only then get to the actual updates we want to process".

This works for a number of 1000 updates waiting - but if paused TLDs have tens or hundreds of thousands of updates waiting - this might still choke up other TLDs (not to mention we keep reading / updating 10s or 100s of thousands of tasks in the queue, that's... bad.)

A more thorough fix will come in a future CL, as it requires a more thorough change in the code.

Note that the queue lease command supports a maximum of 10 QPS. Any more than
that - and we get errors / empty results. Hence we limit our QPS to 9 to be on
the safe side.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185218684
2018-02-20 15:42:09 -05:00
jianglai
1227046bcb Fix in-baliwick nameserver check bug in CloudDnsWriter
In publishDomain, we load the subordinate hosts of the domain from datastore and compare its nameservers to them. For any nameserver that is in-baliwick, we call publishSubordinateHost on it and stage the A/AAAA records of the host for publication.

This is superior to the old approach where we use hostName.endsWith(domainName) to check if a nameserver is in-baliwick because it will mistake ns.another-example.tld as a subordinate host of example.tld. It is also better than checking hostName.endsWith("." + domainName), which will catch false positives as above, but falls short in a corner case where the nameserver has been deleted before its superordinate domain's record is updated. In that case, subordinateHosts.cotains(hostName) will be false but hostName.endsWith("." + domainName) will still be true.

Note that we still use the suffix check in filterGlueRecords because it is filtering on existing records from Cloud DNS. It is even advantageous to do so because if there were (and there shouldn't be if everything is consistent) any orphaned glue records (suffix matches to the domain, but not actually in its subordinate host list), they would be retained by the filter and therefore be deleted when the staged changes are committed.

Also fixed a few tests that should have failed had we checked subrodinate hosts....

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184732005
2018-02-20 15:15:57 -05:00
larryruili
6280d74f1c Fix CloudDnsWriter glue record A/AAAA arguments
Previously, CloudDnsWriter used InetAddress.toString() to produce the ipv4/6
address string (i.e. 127.0.0.1 or 0:0:0:0:0:0:0:1) used as an argument to the
Cloud DNS API. However, this fails because InetAddress uses the format
"HostName/IpAddress" for toString(), which uses the empty string as a HostName
if unspecified. This resulted in the erroneous use of a prefix slash (i.e.
"/127.0.01") as an InetAddress argument, causing all glue record updates to
fail.

This change replaces InetAddress.toString() with InetAddress.getHostAddress(),
which properly generates the IP address for the InetAddress. This also replaces
a lot of logic in the corresponding test with concrete equivalents, preventing
obvious errors like this from creeping up on us in the future.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184708896
2018-02-20 15:11:18 -05:00
guyben
6bcd40f18a Remove "keepTasks" from ReadDnsQueueAction
"keepTasks" is a flag that prevents ReadDnsQueueAction from removing dns-update
tasks from the dns-pull queue, while still launching PublishDnsUpdates tasks to
update the DNS (meaning these tasks will be updated again in the next
ReadDnsQueueAction).

I'm not sure what's the purpose of this flag, but given we now allow multiple
writers (meaning we can already publish the same DNS multiple times) and given
that we can now recover from a bad writer (if a writer doesn't belong to a TLD,
we put the dns-updates queued for that writer back into the dns-pull queue) - I
suspect we don't need it anymore.

Alternative considered: changing this to a "dryRun" flag that won't actually
launch PublishDnsUpdates tasks, but will log which tasks it would have
launched. Decided against it because we will still need to "own" any task for a
significant amount of time if there are many (tens of thousands) tasks in the
queue. Hence a "dryRun" will still affect any actual runs for some time.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183997187
2018-02-01 22:05:40 -05:00
guyben
bf321ca044 Add label for the DnsWriter in the publishDnsUpdates metrics
This allows grouping metrics based on the DnsWriter. We can already group by
the TLD, but since a TLD can have multiple writers, and since different writers
perform very differently from one another, it could be important to group by
writer as well.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182255398
2018-01-19 14:53:21 -05:00
guyben
633eb3179a Skip RRS update if existing records are equal to desired records
This is done first and formost to stop "empty" commits that cause errors in
publishDnsUpdates. The reason being that the Cloud DNS api fails when there are
no updates at all in a change.

Allowing this is a requirement for the writer to be idempotent - if we delete a
domain, then run the writer to delete it again - we'll get 0 additions and 0
deletions which fails.

This isn't theoretical either - we've seen it happen, causing a
publishDnsUpdates to fail over and over again.

While fixing this, we also remove all RRS that are common between additions and
deletions. This is just an optimization and shouldn't affect behavior.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=179525218
2017-12-27 11:18:21 -05:00
mcilwain
7dc224627f Automatically refactor more exception testing to use new JUnit rules
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178911894
2017-12-27 10:42:36 -05:00
mcilwain
03c782f38e Replace ExceptionRule with ExpectedException
This is in preparation for running the automatic refactoring script that
will replace all ExpectedExceptions with use of JUnit 4.13's assertThrows/
expectThrows.

Note that I have recorded the callsites of assertions about EppExceptions
being marshallable and will edit those specific assertions back in after
running the automatic refactoring script (which do not understand these).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178812403
2017-12-13 12:43:45 -05:00
mcilwain
b825a2b5a8 Get rid of custom ExceptionRule methods
The only remaining methods on ExceptionRule after this are methods that
also exist on ExpectedException, which will allow us to, in the next CL,
swap out the one for the other and then run the automated refactoring to
turn it all into assertThrows/expectThrows.

Note that there were some assertions about root causes that couldn't
easily be turned into ExpectedException invocations, so I simply
converted them directly to usages of assertThrows/expectThrows.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178623431
2017-12-13 12:43:45 -05:00
guyben
d87f01e7bf Fetch data from Cloud DNS in parallel
Before pushing an update to Cloud DNS, the CloudDnsWriter needs to read all the domain RRSs from Cloud DNS one by one to know what to delete.

Doing so sequentially results in update times that are too long (approx 200ms per domain, which is 20 seconds per batch of 100) severely limiting our QPS.

This CL uses Concurrent threading to do the Cloud DNS queries in parallel. Unfortunately, my preferred method (Set.parallelStream) doesn't work on App Engine :(

This reduces the per-item time from 200ms to 80ms, which can be further reduced to 50ms if we remove the rate limiter (currently set to 20 per second).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178126877
2017-12-13 12:43:45 -05:00
guyben
8e33bc898f Requeue domains on wrong DnsWriter.
Currently, if for some reason publishDnsUpdates gets a request to publish
domains to a DnsWriter that doesn't belong to said domain - it logs a warning
but published anyway.

This can happen when Writers are changed (swapped for a different writer)
leaving update commands "stuck" with the wrong writer.

Normally you'd expect these update commands to just publish their data and be
on their way. However, if the update fails for some reason (likely - if the
Writer change happened BECAUSE the updates are failing) then the same
publishDnsUpdate command will continue to run forever.

This CL changes the behavior for "publish to wrong DnsWriter" to instead
requeue the batched domains / hosts back to the Dns-pull queue, allowing them
to be re-batched (and hence published) with the correct DnsWriter(s). This
re-batching will take place in ReadDnsQueueAction.java

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177863076
2017-12-13 12:43:45 -05:00
jianglai
1c1f95992a Move backported JUnit file to third_party (part 2)
Last commit did not pick up all the changes because MOE incorrectly attributed some changes to the wrong commit. This commit should reconcile these. Also picked up some changes to how hamcrest library is depended upon in BUILD file, which should have been included in previous commits.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177637931
2017-12-02 11:37:46 -05:00
mcilwain
e2db3f914e Clean up some code quality issues
This removes some qualifiers that aren't necessary (e.g. public/abstract on interfaces, private on enum constructors, final on private methods, static on nested interfaces/enums), uses Java 8 lambdas and features where that's an improvement

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177182945
2017-12-01 22:14:06 -05:00
mcilwain
bbe2584da4 Refactor Guava functional methods to use lambdas
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177027488
2017-12-01 22:14:05 -05:00
mcilwain
c7484b25e0 Automatically refactor some exception testing to use new JUnit rules
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=176550995
2017-11-21 18:56:04 -05:00
guyben
6f659659ff Simplify the CloudDnsWriter callWithRetry functional
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=176512218
2017-11-21 18:49:14 -05:00
mcilwain
2aa897e698 Remove unnecessary generic type arguments
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=175155365
2017-11-21 18:17:31 -05:00
mcilwain
d22986a0a3 Use compound return statements for greater readability
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=173451653
2017-11-07 17:12:57 -05:00
mcilwain
f59c3daf6d Remove unused Truth8.assertThat() imports in tests
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=173423036
2017-11-07 17:01:19 -05:00
guyben
d577a281b8 Add stackdriver metrics to publishDnsUpdates
Adding the following metrics:

- how long does an update take, per TLD
- number of domains published, per TLD
- number of hosts published, per TLD

All are distributions.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172933834
2017-10-24 16:53:47 -04:00
mmuller
bf818a0139 Translate multi-part TLD zone names
Convert periods to hyphens in multi-part TLDs when using them as a zone name
(cloud-dns doesn't allow periods in zone names).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172007089
2017-10-24 16:53:47 -04:00
mcilwain
c0f8da0c6e Switch from Guava Optionals to Java 8 Optionals
This was a surprisingly involved change. Some of the difficulties included
java.util.Optional purposely not being Serializable (so I had to move a
few Optionals in mapreduce classes to @Nullable) and having to add the Truth
Java8 extension library for assertion support.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171863777
2017-10-24 16:53:47 -04:00
guyben
840d53c819 Allow EventSample.record to accept numSamples=0
There's really no reason not to.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171037754
2017-10-24 16:50:30 -04:00
mcilwain
5edb7935ed Run automatic Java 8 conversion over codebase
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171174380
2017-10-10 12:09:41 -04:00
mmuller
0c8b5bc8bf Build DNS changes with HashMap instead of Builder
The existing CloudDnsWriter code uses ImmutableMap.Builder to construct the
map of DNS records to update.  This has been seen to fail on alpha, presumably
in a cases where host records and domain records produce duplicate updates for
a host.

Convert the Builder to a HashMap, allowing us to safely overwrite existing
records in the case of duplicates.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170103421
2017-10-04 16:16:45 -04:00
guyben
c3861f6e95 Swap all uses of Lock to LockHandler
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=167661348
2017-09-12 15:51:50 -04:00
guyben
d5ac03aae4 Make DnsWriter truly atomic
Right now - if there's an error during DnsWriter.publish*, all the publish from
before that error will be committed, while all the publish after that error
will not.

More than that - in some writers partial publishes can be committed, depending
on implementation.

This defines a new contract that publish* are only committed when .commit is
called. That way any error will simply mean no publish is committed.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=165708063
2017-08-29 16:40:07 -04:00
mountford
2547313ef9 Use config settings for DNS TTL values across all code
Attending to this old bug will improve our ability to perform zone comparisons between Datastore and the DNS provider. Right now, zone comparison finds some bogus differences, because the TTL we send to the DNS subsystem doesn't match the TTL we use when generating our local dump files.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164635557
2017-08-29 15:50:44 -04:00
mcilwain
2a29ada032 Allow multiple DNS writers on TLDs
This completes the data/functionality migration for multiple DNS writers.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163835077
2017-08-01 17:10:33 -04:00
mcilwain
8869814e96 Add logging statement for # of tasks in DNS queue
This will make DNS issues easier to debug retroactively as we will be
able to determine, by looking at the logs, if the queue size was growing
unbounded.

Also adds some logging helpers to allow programmatically choosing the level
of logging.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163123783
2017-08-01 17:02:00 -04:00
mcilwain
4a921973ea Add capability to sync DNS using multiple writers if configured
This is written in such a way that it can safely handle task items in the
old format so long as the DNS writer to use for the given TLD is unambiguous
(which it is for now, until we allow multiple DNS writers to be configured).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=162293412
2017-08-01 16:38:36 -04:00
mcilwain
37f33e5e7a Migrate plural DNS writers field to being the canonical one
After this point all data is migrated to use the new canonical
plural version, and subsequent code changes can be made that use
multiple writers.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=161673486
2017-08-01 16:12:42 -04:00
nickfelt
65aaeccfc6 Fix TaskQueueHelper param matching for pull queue tasks
This fixes TaskQueueHelper methods and MatchableTaskInfo so that the .param() matching works for pull queues, by parsing the payload for URL-encoded parameters more liberally.  As such, it updates all the places where formerly we were hacking around this by manually constructing the expected payloads and using TaskMatcher.payload() instead.

It also adds a TaskQueueHelper.assertTasksEnqueued() overload that accepts an Iterable<TaskStateInfo> so that you can cleanly assert that a queue contains the same tasks that were returned via a previous call to getQueueInfo("queue").getTaskInfo().

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=156604901
2017-05-23 17:22:49 -04:00
mountford
e00b0440b2 Roll back accidental inclusion of non-public Mockito helper methods
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=154232020
2017-04-26 11:19:59 -04:00
mountford
cb145e0721 Remove some unnecessary uses of MockitoJUnitRunner, which is discouraged
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=154202151
2017-04-26 11:15:07 -04:00
mountford
d35be27b65 Remove some unnecessary uses of MockitoJUnitRunner, which is discouraged
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=153589104
2017-04-26 10:42:43 -04:00
mcilwain
cdadb54acd Refer to Datastore everywhere correctly by its capitalized form
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=147479683
2017-02-17 12:12:12 -05:00
mcilwain
f212a53232 Make dependency injection and construction of DnsQueue nicer
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=146783008
2017-02-07 13:26:13 -05:00
mmuller
b70f57b7c7 Update copyright year on all license headers
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=146111211
2017-02-02 16:27:22 -05:00
mcilwain
eaec03e670 Move ConfigModule and LocalTestConfig into RegistryConfig
This is the final preparatory step necessary in order to load and load
configuration from YAML in a static context and then provide it either via
Dagger (using ConfigModule) or through RegistryConfig's existing static
functions.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=143819983
2017-01-09 12:01:09 -05:00
jart
734130aa73 Restructure Maven dependencies in build
We're now using java_import_external instead of maven_jar. This allows
us to specify the relationships between jars, thereby allowing us to
eliminate scores of vendor BUILD files that did nothing but re-export
@foo//jar targets, thus addressing the concerns of djhworld on Hacker
News: https://news.ycombinator.com/item?id=12738072

We now have redundant failover mirrors, which is a feature I added to
Bazel 0.4.2 in ed7ced0018

A new standard naming convention is now being used for all Maven repos.
Those names are calculated from the group_artifact name using the
following algorithm that eliminates redundancy:
https://gist.github.com/jart/41bfd977b913c2301627162f1c038e55

The JSR330 dep has been removed from java targets if they also depend
on Dagger, since Dagger always exports JSR330.

Annotation processor dependencies should now be leaner and meaner, by
more appropriately managing what needs to be on the classpath at
runtime. This should trim down the production jar by >1MB. As it stands
currently in the open source world:

- backend_jar_deploy.jar: 50MB
- frontend_jar_deploy.jar: 30MB
- tools_jar_deploy.jar: 45MB

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=143487929
2017-01-09 11:59:04 -05:00
mcilwain
28f6c770c8 Add MOE equivalence for sync on 2016-12-19
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=142449539
2017-01-09 11:59:04 -05:00
mcilwain
2b7d580bb3 Run buildifier on codebase to format BUILD files
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=140362453
2016-11-28 18:15:21 -05:00
nickfelt
e6ba5687b1 Migrate writeLockTimeout field out of DnsQueue
This makes the usage of DnsQueue.create() safer, since we're no longer
forced to hardcode a copy of the @Config("dnsWriteLockTimeout") value
within that method.  That value is only needed for leaseTasks(), which
is only called in one place (ReadDnsQueueAction), so we can just pass
it in from that callsite.

Also removes an unused overload of leaseTasks() that allowed specifying
a tag, which is a feature we no longer need.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=136162491
2016-10-14 17:00:33 -04:00
mcilwain
6a738557fb Use Dagger to @Inject DnsQueue everywhere that is feasible
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=136062053
2016-10-14 16:58:07 -04:00