Updated Reporting (Beam pipeline), Registrar sync to sheets, and Cloud Dns.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=212811185
non-compliant packages that depend on java_x_proto_library targets. This will enable blaze
to enforce strict_deps by default while missing dependencies are added to these packages.
Changes made using newly released blaze flag:
USE_CANARY_BLAZE=nightly blaze build -k --experimental_java_proto_library_enforce_strict_deps
then extracting the packages from the resulting add_dep commands, and for each package running:
buildozer 'add features -jpl_strict_deps' <package>:__pkg__
More information: []
Tested:
TAP sample presubmit queue
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206349847
Also adds Guy to list of CONTRIBUTORS and removes Xooglers (who definitely won't be contributing more code).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=202476218
This affects JSR305, JSR330, and Guava annotations.
The exact command run to generate this CL was:
build_cleaner '//third_party/java_src/gtld/...' -c '' --dep_restrictions='//third_party/java/jsr330_inject,//third_party/java/jsr305_annotations,[]'
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=202322747
Now that the large zone re-signing test is complete, we no longer need it.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=199507075
Now that the large zone re-signing test is complete, we no longer need it.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=199507075
Currently, we have two different ways to parse a "set" parameter:
key=value1&key=value2&key=value3...
and
keys=value1,value2,value3
This is error prone for several reasons:
- different parts of the code must be "synchronized" to use the same style (the
place that creates the request, and the place that parses the request)
- for the key=value1&key=value2, we often use the same key name for the single
value and the set value. This can result in subtle bugs where part of the
code will successfully read the key assuming there's only one key (and will
get the first key=value1, ignoring the rest)
Here we transition everything to the keys=value1,value2,value3 method. This one
was chosen because:
- it's shorter
- it's more intuitive for users
- the key name is plural, differentiating it from the singular key=value that
other requests might need
-----------------------------------
To make sure there are not "transition issues", we will continue to support
(with warnings) the key=value1&key=value2 parameter parsing until we're sure we
haven't forgotten to update any part of the code.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=198810681
This is a 'green' Flogger migration CL. Green CLs are intended to be as
safe as possible and should be easy to review and submit.
No changes should be necessary to the code itself prior to submission,
but small changes to BUILD files may be required.
Changes within files are completely independent of each other, so this CL
can be safely split up for review using tools such as Rosie.
For more information, see []
Base CL: 197826149
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=198560170
They were failing because the maximum App Engine task batch size is 1,000, and
we currently have more than 4,000 tasks in the pull queue. We keep re-uploading
those to NORDN because we're unable to delete the tasks after successful upload,
so the leases expire and they get processed again.
Also renames TaskEnqueuer to TaskQueueUtils to reflect its newly expanded role.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=197060903
The optional code has been around for a while, we can get rid of it now.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=192344612
The parameters were optional during the transition to allow old jobs stuck in the queue to work properly. It's been 2 months now so it's time to end the transition.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=190235532
Following b/74072938, our quota for our main projects (prod, sandbox, alpha) is now 5000 queries per 100s, which allows us to increase our client-side rate limit accordingly.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=188026911
This enables sharded DNS publishing on a per-TLD basis. Instead of a TLD-wide lock, the sharded scheme locks each update on the shard number, allowing parallel writes to DNS.
We allow N (the number of shards) to be 0 or 1 for no sharding, and N > 1 for an N-way sharding scheme. Unless explicitly set, all TLDs default to a numShards of 0, so we don't have to reload all registry objects explicitly.
WARNING: This will change the lock name upon deployment for the PublishDnsAction from "<TLD> Dns Updates" to "<TLD> Dns Updates shard 0". This may cause concurrency issues if the underlying DNSWriter is not parallel-write tolerant (currently all production usages are ZonemanWriter, which is parallel-tolerant, so no issues are expected).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=187525655
Added:
- dns/update_latency, which measures the time since a DNS update was added to the pull queue until that updates is committed to the DnsWriter
- - It doesn't check that after being committed, it was actually published in the DNS.
- dns/publish_queue_delay, which measures how long since the initial insertion to the push queue until a publishDnsUpdate action was handled. It measures both for successes (which is what we care about) and various failures (which are important because the success for that publishDnsUpdate will be > than any of the previous failures)
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185995678
The task-queue API only allows reading 1000 tasks at a time, hence the original reason for this limit. We get over that limit by reading (and processing) items from the queue in a loop - 1000 at a time.
This is important because the 1000 dns-updates are shared among all TLDs,
meaning that a TLD with >1000 waiting updates can affect the update latency of
other TLDs.
In addition, partially fixes the bug where if there are more than 1000 updates to paused
/ non-existing TLDs, we completely block all updated to all TLDs.
By partially fixed, I mean "if we have around 1000 updates to paused TLDs, we will read them every time ReadDnsUpdates is called, ignore then, and only then get to the actual updates we want to process".
This works for a number of 1000 updates waiting - but if paused TLDs have tens or hundreds of thousands of updates waiting - this might still choke up other TLDs (not to mention we keep reading / updating 10s or 100s of thousands of tasks in the queue, that's... bad.)
A more thorough fix will come in a future CL, as it requires a more thorough change in the code.
Note that the queue lease command supports a maximum of 10 QPS. Any more than
that - and we get errors / empty results. Hence we limit our QPS to 9 to be on
the safe side.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185218684
In publishDomain, we load the subordinate hosts of the domain from datastore and compare its nameservers to them. For any nameserver that is in-baliwick, we call publishSubordinateHost on it and stage the A/AAAA records of the host for publication.
This is superior to the old approach where we use hostName.endsWith(domainName) to check if a nameserver is in-baliwick because it will mistake ns.another-example.tld as a subordinate host of example.tld. It is also better than checking hostName.endsWith("." + domainName), which will catch false positives as above, but falls short in a corner case where the nameserver has been deleted before its superordinate domain's record is updated. In that case, subordinateHosts.cotains(hostName) will be false but hostName.endsWith("." + domainName) will still be true.
Note that we still use the suffix check in filterGlueRecords because it is filtering on existing records from Cloud DNS. It is even advantageous to do so because if there were (and there shouldn't be if everything is consistent) any orphaned glue records (suffix matches to the domain, but not actually in its subordinate host list), they would be retained by the filter and therefore be deleted when the staged changes are committed.
Also fixed a few tests that should have failed had we checked subrodinate hosts....
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184732005
Previously, CloudDnsWriter used InetAddress.toString() to produce the ipv4/6
address string (i.e. 127.0.0.1 or 0:0:0:0:0:0:0:1) used as an argument to the
Cloud DNS API. However, this fails because InetAddress uses the format
"HostName/IpAddress" for toString(), which uses the empty string as a HostName
if unspecified. This resulted in the erroneous use of a prefix slash (i.e.
"/127.0.01") as an InetAddress argument, causing all glue record updates to
fail.
This change replaces InetAddress.toString() with InetAddress.getHostAddress(),
which properly generates the IP address for the InetAddress. This also replaces
a lot of logic in the corresponding test with concrete equivalents, preventing
obvious errors like this from creeping up on us in the future.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184708896
"keepTasks" is a flag that prevents ReadDnsQueueAction from removing dns-update
tasks from the dns-pull queue, while still launching PublishDnsUpdates tasks to
update the DNS (meaning these tasks will be updated again in the next
ReadDnsQueueAction).
I'm not sure what's the purpose of this flag, but given we now allow multiple
writers (meaning we can already publish the same DNS multiple times) and given
that we can now recover from a bad writer (if a writer doesn't belong to a TLD,
we put the dns-updates queued for that writer back into the dns-pull queue) - I
suspect we don't need it anymore.
Alternative considered: changing this to a "dryRun" flag that won't actually
launch PublishDnsUpdates tasks, but will log which tasks it would have
launched. Decided against it because we will still need to "own" any task for a
significant amount of time if there are many (tens of thousands) tasks in the
queue. Hence a "dryRun" will still affect any actual runs for some time.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183997187
This fixes up the following problems:
1. Using string concatenation instead of the formatting variant methods.
2. Logging or swallowing exception messages without logging the exception
itself (this swallows the stack trace).
3. Unnecessary logging on re-thrown exceptions.
4. Unnecessary use of formatting variant methods when not necessary.
5. Complicated logging statements involving significant processing not being
wrapped inside of a logging level check.
6. Redundant logging both of an exception itself and its message (this is
unnecessary duplication).
7. Use of the base Logger class instead of our FormattingLogger class.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182419837
This allows grouping metrics based on the DnsWriter. We can already group by
the TLD, but since a TLD can have multiple writers, and since different writers
perform very differently from one another, it could be important to group by
writer as well.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182255398
This is a temporary change used for b/71607306 only, and has TODOs to revert at
every change once the bug is done.
We want to check how Cloud DNS handles large (1M+ domain) zones, especially
during resigning. However, due to a separate bug (b/70980350, and maybe another
one) we can't currently create such a large zone in nomulus within the required
4 day timeframe. The most we managed is 300k domains.
We could wait until the bug is fixed, but if there's a problem with Cloud DNS -
we want to find out as fast as possible. Hence, this CL that allows us to
register 1M domains by creating "just" 100k domains in nomulus.
The CL creates a new "MultiplyingCloudDnsWriter" writer, that when publishing a
domain, pretends that we are publishing 10 domains (with 9 additional
"fictional" domains, that get their Datastore data from the actual domain).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=180962149
This is done first and formost to stop "empty" commits that cause errors in
publishDnsUpdates. The reason being that the Cloud DNS api fails when there are
no updates at all in a change.
Allowing this is a requirement for the writer to be idempotent - if we delete a
domain, then run the writer to delete it again - we'll get 0 additions and 0
deletions which fails.
This isn't theoretical either - we've seen it happen, causing a
publishDnsUpdates to fail over and over again.
While fixing this, we also remove all RRS that are common between additions and
deletions. This is just an optimization and shouldn't affect behavior.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=179525218
Before pushing an update to Cloud DNS, the CloudDnsWriter needs to read all the domain RRSs from Cloud DNS one by one to know what to delete.
Doing so sequentially results in update times that are too long (approx 200ms per domain, which is 20 seconds per batch of 100) severely limiting our QPS.
This CL uses Concurrent threading to do the Cloud DNS queries in parallel. Unfortunately, my preferred method (Set.parallelStream) doesn't work on App Engine :(
This reduces the per-item time from 200ms to 80ms, which can be further reduced to 50ms if we remove the rate limiter (currently set to 20 per second).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=178126877
Currently, if for some reason publishDnsUpdates gets a request to publish
domains to a DnsWriter that doesn't belong to said domain - it logs a warning
but published anyway.
This can happen when Writers are changed (swapped for a different writer)
leaving update commands "stuck" with the wrong writer.
Normally you'd expect these update commands to just publish their data and be
on their way. However, if the update fails for some reason (likely - if the
Writer change happened BECAUSE the updates are failing) then the same
publishDnsUpdate command will continue to run forever.
This CL changes the behavior for "publish to wrong DnsWriter" to instead
requeue the batched domains / hosts back to the Dns-pull queue, allowing them
to be re-batched (and hence published) with the correct DnsWriter(s). This
re-batching will take place in ReadDnsQueueAction.java
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177863076
Last commit did not pick up all the changes because MOE incorrectly attributed some changes to the wrong commit. This commit should reconcile these. Also picked up some changes to how hamcrest library is depended upon in BUILD file, which should have been included in previous commits.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177637931
This removes some qualifiers that aren't necessary (e.g. public/abstract on interfaces, private on enum constructors, final on private methods, static on nested interfaces/enums), uses Java 8 lambdas and features where that's an improvement
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=177182945
They can be inferred correctly even in Java 7, and display as
compiler warnings in IntelliJ.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=173451087
Adding the following metrics:
- how long does an update take, per TLD
- number of domains published, per TLD
- number of hosts published, per TLD
All are distributions.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172933834
Convert periods to hyphens in multi-part TLDs when using them as a zone name
(cloud-dns doesn't allow periods in zone names).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=172007089
This was a surprisingly involved change. Some of the difficulties included
java.util.Optional purposely not being Serializable (so I had to move a
few Optionals in mapreduce classes to @Nullable) and having to add the Truth
Java8 extension library for assertion support.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=171863777
Add cloudDns.{rootUrl, servicePath} to allow us to point an environment at the
Cloud DNS staging API for testing. Make sandbox and alpha point to staging.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170340859
The existing CloudDnsWriter code uses ImmutableMap.Builder to construct the
map of DNS records to update. This has been seen to fail on alpha, presumably
in a cases where host records and domain records produce duplicate updates for
a host.
Convert the Builder to a HashMap, allowing us to safely overwrite existing
records in the case of duplicates.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=170103421
This still retains the DnsWriter interface itself for better integration
with Dagger and to preserve the option of having a DNS writer that does
not have this requirement (e.g. because it is idempotent).
This also makes the commit check thread-safe, which is a nice-to-have.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=168451114
Right now - if there's an error during DnsWriter.publish*, all the publish from
before that error will be committed, while all the publish after that error
will not.
More than that - in some writers partial publishes can be committed, depending
on implementation.
This defines a new contract that publish* are only committed when .commit is
called. That way any error will simply mean no publish is committed.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=165708063
Log tasks and task count on the input side of the queue so we can track which
things go in.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=165026523
Attending to this old bug will improve our ability to perform zone comparisons between Datastore and the DNS provider. Right now, zone comparison finds some bogus differences, because the TTL we send to the DNS subsystem doesn't match the TTL we use when generating our local dump files.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=164635557
This completes the data/functionality migration for multiple DNS writers.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=163835077