It seems that even though the token is supposed to be valid for 60min, in
practice it expires before that. Reducing caching time to 30min solves the
problem (at least as far as I can tell). This should not increase too much load
as we are only calling the API twice an hour instead of once.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186830395
The RDAP Pilot Program operational profile document indicates that domain
responses should list, in addition to their normal contacts, a special entity
for the registrar.
1.5.12. The domain object in the RDAP response MUST contain an entity with the registrar role (called registrar entity in this section). The handle of the entity MUST be equal to the IANA Registrar ID. A valid fn member MUST be present in the registrar entity. Other members MAY be present in the entity (as specified in RFC6350, the vCard Format Specification and its corresponding JSON mapping RFC7095). Contracted parties MUST include an entity with the abuse role (called Abuse Entity in this section) within the registrar entity. The Abuse Entity MUST include tel and email members, and MAY include other members.
1.5.13. The entity with the registrar role in the RDAP response MUST contain a publicIDs member [RFC7483] to identify the IANA Registrar ID from the IANA’s Registrar ID registry (https://www.iana.org/assignments/registrar-ids/registrar-ids.xhtml). The type value of the publicID object MUST be equal to IANA Registrar ID.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186797360
I'm actually surprised that we had this in our code, as it seems like a huge
oversight, but we were individually loading each and every referenced contact
and host during domain/application create/update/allocate flows. Loading them
all as a single batch should reduce round trips to Datastore by a good deal,
thus improving performance.
We aren't giving up any transactional consistency in doing so and the only
potential downside I can think of is that we're always loading all contacts/
hosts instead of only some of them, in the rare case that one of the earlier
contacts/hosts is actually in pending delete (which allowed short-circuiting).
However, the gains from only making one round-trip should swamp the potential
losses in occasionally loading more data than is strictly necessary.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186656118
Currently we validate the fee extension by summing up all fees present in the extension and comparing it against the total fee to be charged. While this works in most cases, we'd like the ability to individually validate each fee. This is especially useful during EAP when two fees are charged, a regular "create" fee that would also be amount we charge during renewal, and a one time "EAP" fee.
Because we can only distinguish fees by their descriptions, we try to match the description to the format string of the fee type enums. We also only require individual fee matches when we are charging more than one type of fees, which makes the change compatible with most existing use cases where only one fees is charged and the description field is ignored in the extension.
We expect the workflow to be that a registrar sends a domain check, and we reply with exactly what fees we are expecting, and then it will use the descriptions in the response to send us a domain create with the correct fees.
Note that we aggregate fees within the same FeeType together. Normally there will only be one fee per type, but in case of custom logic there could be more than one fee for the same type. There is no way to distinguish them as they both use the same description. So it is simpler to just aggregate them.
This CL also includes some reformatting that conforms to google-java-format output.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186530316
The unlimited exponential backoff makes cascading failure a serious problem,
when encountering burst DNS load. Originally, it was exponential backoff, with min 1 sec max 1 hour.
This changes it to be linearly scaling from
30 seconds to 10 minutes. Min 30 seconds is used to avoid over-retrying due to lock contention. Max 10 minutes allows for more retries within our 1 hour SLA. Finally, we're
switching to linear scaling to increase the number of 'quick' retries for low
backoff time, before ultimately settling on the upper bound of 10 minutes (if a
task ever gets to that point, it's probably misconfigured.)
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186041553
A recent change in Netty 4.1.21 (978a46cc0a) tried to fix an issue where channels might be closed before any handshake exception can be propagated. This however introduced a regression where the the connection is not closed at all after a handshake failure, which caused test failures because we were expecting the connection to be closed after a handshake failure.
We rolled back dependency on Netty 4.1.21 so that the test would pass. A fix upstream is schedule for 4.1.22 (https://github.com/netty/netty/pull/7727).
However this does reveal some potential problem in our tests. Namely we did not wait for the connection to be closed before assertion on it. The old Netty behavior closes the connection before handshake exception is thrown, and we *do* wait for the handshake exception. The connection assertion happens after the handshake exception is verified, so by then the connection is always closed.
When the upstream fix is released, we'd run into concurrency problem described above. So we instead wait for the connection to be closed before checking handshake exception (by releasing the lock in a channel close listener), which guarantees that when we check the connection, it is always closed.
Also fixes some javadoc errors.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186021997
With https://github.com/bazelbuild/bazel/issues/4376, bazel 0.10.0 now supports accessing system TMPDIR in its sandbox. Use this instead of hardcoding /tmp in BUILD rules to get around the gpg-agent path length restriction.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186010932
Added:
- dns/update_latency, which measures the time since a DNS update was added to the pull queue until that updates is committed to the DnsWriter
- - It doesn't check that after being committed, it was actually published in the DNS.
- dns/publish_queue_delay, which measures how long since the initial insertion to the push queue until a publishDnsUpdate action was handled. It measures both for successes (which is what we care about) and various failures (which are important because the success for that publishDnsUpdate will be > than any of the previous failures)
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185995678
The intention (from []is:
- actualAsString() is the method that people call.
- actualCustomStringRepresentation() is the method that people override.
Fortunately, no one actually calls actualCustomStringRepresentation(), aside from some tests that call it to test a subject's implementation. That's easy enough to work around by extracting a method.
(Arguably @ForOverride should permit calls from tests in some cases (now that Error Prone knows how to identify test code). But it's not entirely clear, since, e.g., people shouldn't be testing Converter.doForward(null) because the method can never be invoked that way. Some discussion here: []Tested:
global TAP
[]
RELNOTES=Marked `actualCustomStringRepresentation()` as `@ForOverride`. To retrieve the string representation, call `actualAsString()`.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185672328
The START_DATE_SUNRISE phase allows registration of domains only with a signed mark. In all other respects - it is identical to the GENERAL_AVAILABILITY phase.
Note that Anchor Tenants bypass all checks, and are hence able to register domains without a signed mark.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185534793
The task-queue API only allows reading 1000 tasks at a time, hence the original reason for this limit. We get over that limit by reading (and processing) items from the queue in a loop - 1000 at a time.
This is important because the 1000 dns-updates are shared among all TLDs,
meaning that a TLD with >1000 waiting updates can affect the update latency of
other TLDs.
In addition, partially fixes the bug where if there are more than 1000 updates to paused
/ non-existing TLDs, we completely block all updated to all TLDs.
By partially fixed, I mean "if we have around 1000 updates to paused TLDs, we will read them every time ReadDnsUpdates is called, ignore then, and only then get to the actual updates we want to process".
This works for a number of 1000 updates waiting - but if paused TLDs have tens or hundreds of thousands of updates waiting - this might still choke up other TLDs (not to mention we keep reading / updating 10s or 100s of thousands of tasks in the queue, that's... bad.)
A more thorough fix will come in a future CL, as it requires a more thorough change in the code.
Note that the queue lease command supports a maximum of 10 QPS. Any more than
that - and we get errors / empty results. Hence we limit our QPS to 9 to be on
the safe side.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185218684
When a quota request is rejected, increment the metric counter by one.
Also makes both frontend and backend metrics singleton because all the fields they have a static.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185146804
SoySyntaxException is an abstract exception type and is never even declared to be thrown (all declarations about this changed about 2 years ago). So places catching it should either change to catch SoyCompilationException, or do nothing and let it propagate.
Tested:
TAP sample presubmit queue
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185050724
The quota handler terminates connections when quota is exceeded.
The next CL will add instrumentation for quota related metrics.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185042675
Changes the code to be in compliance with the RDAP Pilot Profile document,
which specifies:
1.4.11. If permitted or required by an ICANN agreement provision, waiver, or Consensus Policy, an RDAP response may contain redacted registrant, administrative, technical and/or other contact information. If any information is redacted, the response MUST include a remarks member with title "Data Policy", type "object truncated due to authorization", a description containing the string "Some of the data in this object has been removed" and a links member with the elements rel:alternate and href indicating where the data policy can be found. An entity with redacted information MUST include the "removed" value in the status element.
We were using the "removed" status to indicate deleted contacts and inactive
registrars. Instead, we will now use "inactive", so that we can use "removed"
to indicated redaction.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185039201
Every time you run nomulus tool you currently get a bunch of useless output
to the console that looks like this:
---
Feb 08, 2018 3:11:18 PM google.registry.config.YamlUtils mergeYaml
INFO: Successfully loaded environment configuration YAML file.
Feb 08, 2018 3:11:20 PM com.google.wrappers.base.GoogleInit logArgs
INFO: First call to GoogleInit.initialize - removeFlags: false, args: [ProcessUtils, --noinstall_signal_handlers]
Feb 08, 2018 3:11:20 PM com.google.wrappers.base.GoogleInit logArgs
INFO: Subsequent call to GoogleInit.initialize, ignoring - removeFlags: false, args: [SecureWrapperBindings (via google.registry.tools.RegistryTool), --noinstall_signal_handlers]
Feb 08, 2018 3:11:25 PM com.google.monitoring.metrics.MetricRegistryImpl newIncrementableMetric
INFO: Registered new counter: /lock/acquire_lock_requests
Feb 08, 2018 3:11:25 PM com.google.monitoring.metrics.MetricRegistryImpl newEventMetric
INFO: Registered new event metric: /lock/lock_duration
---
This CL fixes that by increasing the console logging threshold from INFO to
WARNING for the relevant paths, for nomulus tool only.
I also had to decrease the logging level of one statement inside YamlUtils
from INFO to FINE, because it was being called by AppEngineConnectionFlags'
constructor in building the HostAndPort server field, which is executed
from the first line of RegistryCli.runCommand(), whereas
loggingParams.configureLogging(), which actually reads in and takes action
on the logging.properties file, isn't called until much later. This is fine
though, because there's little value from logging the statement
"Successfully loaded environment configuration YAML file." every time every
command or flow is executed. We certainly do log errors if that ever fails,
which is the important part.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185036329
As part of our commit log layer that we have built on top of Objectify, we
enforce the constraint of a monotonically increasing transaction time with a
millisecond granularity. Thus, if two transactions occur at exactly the same
millisecond, as had been the case with these tests, one will get a
TimestampInversionException and retry. However, since we're mocking time in
these tests as well, they will retry at exactly the same millisecond, and thus
continue failing for the same reason until the max retry threshold is hit. The
EPP flow then ultimately fails with a generic "Command failed" response. All
of this is actual findings from looking at test logs from a flake.
It's a mystery to me why these tests were merely flaky; it seems like they
should have always been failing for this reason, but they were still only
sometimes failing. Who knows.
The fix is simple -- Adjust the tests so that no two commands are run at exactly
the same millisecond. Note that this is a test-only problem; in the real world,
a command that temporarily fails will simply then succeed the next time it is
retried, since time is actually elapsing. This implies that our commit log system
imposes a max mutation rate of 1,000 QPS across our entire system. This is
unlikely to be a problem in practice for any existing registry of any size.
Also note that, as far the EPP XML itself is concerned, times only have second
granularity, so up to a thousand commands can execute in the same second and
still "appear" to have taken place at the same time as far as EPP is concerned.
That's why this CL only adds millisecond precision to the actual run time, not
to the expected values in the commands.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184777558
The higher the number the better for serious launches. These used to be 100
but had been detuned because instances weren't dying correctly when no longer
needed, thus contributing to higher costs than necessary. That problem was
fixed when we migrated to the Java 8 runtime, however, so there's no reason
not to use the higher number.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184742738
In publishDomain, we load the subordinate hosts of the domain from datastore and compare its nameservers to them. For any nameserver that is in-baliwick, we call publishSubordinateHost on it and stage the A/AAAA records of the host for publication.
This is superior to the old approach where we use hostName.endsWith(domainName) to check if a nameserver is in-baliwick because it will mistake ns.another-example.tld as a subordinate host of example.tld. It is also better than checking hostName.endsWith("." + domainName), which will catch false positives as above, but falls short in a corner case where the nameserver has been deleted before its superordinate domain's record is updated. In that case, subordinateHosts.cotains(hostName) will be false but hostName.endsWith("." + domainName) will still be true.
Note that we still use the suffix check in filterGlueRecords because it is filtering on existing records from Cloud DNS. It is even advantageous to do so because if there were (and there shouldn't be if everything is consistent) any orphaned glue records (suffix matches to the domain, but not actually in its subordinate host list), they would be retained by the filter and therefore be deleted when the staged changes are committed.
Also fixed a few tests that should have failed had we checked subrodinate hosts....
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184732005
See []for more information.
Created with the tools in []
Tested:
TAP --sample for global presubmit queue
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184727400
It's been long enough since the format change adding in years that all
registrars should no longer have any IDs in the old format lying around
that they're still attempting to ACK. All poll messages have already been
coming back to registrars with the new format for months now.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184714735
Previously, CloudDnsWriter used InetAddress.toString() to produce the ipv4/6
address string (i.e. 127.0.0.1 or 0:0:0:0:0:0:0:1) used as an argument to the
Cloud DNS API. However, this fails because InetAddress uses the format
"HostName/IpAddress" for toString(), which uses the empty string as a HostName
if unspecified. This resulted in the erroneous use of a prefix slash (i.e.
"/127.0.01") as an InetAddress argument, causing all glue record updates to
fail.
This change replaces InetAddress.toString() with InetAddress.getHostAddress(),
which properly generates the IP address for the InetAddress. This also replaces
a lot of logic in the corresponding test with concrete equivalents, preventing
obvious errors like this from creeping up on us in the future.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184708896
Now that we've verified the new Beam billing pipeline works, we can delete the
old manual commands we used to use.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184707182
The DS records consist of 4 values:
- keyTag: unsigned short (2 bytes)
- alg: unsigned byte
- digestType: unsigned byte
- digest: binary hex
NOTE: the current CL doesn't support keyData, neither as the optional field in dsData nor as a replacement for dsData
The command tool accepts DS records as a string, where the 4 values are given
as one string separated by white-spaces as follows:
<keyTag> <alg> <digestType> <digest>
e.g. something like:
60485 5 2 D4B7D520E7BB5F0F67674A0CCEB1E3E0614B93C4F9E99B8383F6A1E4469DA50A
which is how it's written in Zone files, allowing easy copy-paste from existing values.
ommas is confusing when using spaces.
The various "numbers" (keyTag, alg, digestType) are only checked that they are
positive integers - the rest is left for the server.
digest it checked to be an even-lengthed hex string.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184583068
Also changed version checker tuple from strings to ints, so that 0.10.0 is larger than 0.4.2.
I think we should just get rid of the version checker all together. It is still requirement 0.4.2 as minimal bazel version, which mostly like will not work with Nomulus at this point.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184536748
Rosie CL for []/third_party (local approval/rejection).
[]
b/71392935
Tested:
TAP --sample for global presubmit queue
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184412611
When enabled for a registrar, all EPP operations on premium domains that have
costs (e.g. creates, renews, transfers) will fail unless the EPP fee extension
is used to explicitly ack the amount of fee as part of the EPP transaction.
This ack is required regardless of whether premium fee acking is required at
the registry level. No data migration is necessary since false is the desired
default for this new attribute.
This CL also contains some slight refactoring of static utility methods used to
perform fee verification; there was short-circuiting at call-sites in two
places when what was really needed was two methods, one implementing additional
functionality on top of the other, and calling the inner method in the places
where short-circuiting had previously been necessary.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=184229363
"keepTasks" is a flag that prevents ReadDnsQueueAction from removing dns-update
tasks from the dns-pull queue, while still launching PublishDnsUpdates tasks to
update the DNS (meaning these tasks will be updated again in the next
ReadDnsQueueAction).
I'm not sure what's the purpose of this flag, but given we now allow multiple
writers (meaning we can already publish the same DNS multiple times) and given
that we can now recover from a bad writer (if a writer doesn't belong to a TLD,
we put the dns-updates queued for that writer back into the dns-pull queue) - I
suspect we don't need it anymore.
Alternative considered: changing this to a "dryRun" flag that won't actually
launch PublishDnsUpdates tasks, but will log which tasks it would have
launched. Decided against it because we will still need to "own" any task for a
significant amount of time if there are many (tens of thousands) tasks in the
queue. Hence a "dryRun" will still affect any actual runs for some time.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183997187
This is a follow-up to []
Also added jaxws-api Maven dependency and upgraded activation artifacts to 1.2.0, in parity with //third_party/java/activation.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183714304
There are 2 types of changed done here:
- reorder the existing cron jobs to be in the same order as production (for
easier diffing)
- add missing cron-jobs to either alpha or sandbox
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183232936
This moves the default yearMonth logic into a common ReportingModule, rather than the coarse-scoped BackendModule, which may not want the default parameter extraction logic, as well as moving the 'yearMonth' parameter constant to the common package it's used in. This also provides a basis for future consolidation of the ReportingEmailUtils and BillingEmailUtils classes, which have modest overlap.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183130311
This uses an extensibility mechanism similar to that of WhoisCommandFactory
and CustomLogicFactory, namely, that a fully qualified Java class is
specified in the YAML file for each environment with the allocation token
custom logic to be used. By default, this points to a no-op base class
that does nothing. Users that wish to add their own allocation token
custom logic can simply create a new class that extends
AllocationTokenCustomLogic and then configure it in their .yaml config
files.
This also renames the existing *FlowCustomLogic *Flow instance variables
from customLogic to flowCustomLogic, to avoid the potential confusion with
the new AllocationTokenCustomLogic class that also now exists.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=183003112
NotModifiedException was using HttpServletResponse.SC_NOT_FOUND instead of SC_NOT_MODIFIED (likely an autocomplete typo).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182976671
To make FOSS build compile, third_party vendoring rules for jaxb are added to package all jaxb related targets imported from maven into a uber jar, mirroring the same practice done in //third_party/java/jaxb
Cloned from CL 182666460 by 'g4 patch'.
Original change by cushon@cushon:rosie182283995-0071_Rosie:47348:citc on 2018/01/20 13:36:15.
More information:
https://docs.google.com/document/d/1htErgDIoHMEuMBfGwrtS_O4WwhTw8QOGLva-7aYYvYs/edit?usp=sharing
Tested:
TAP --sample for global presubmit queue
[] passed FOSS test
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182855173
This also updates to a newer version of Closure Rules and fixes a protobuf dep
compile issue.
Full description of the change:
Soy supports 2 kinds of loops:
* foreach- for iterating over items in a collection, e.g.
{foreach $item in $list}...{/foreach}
* for - for indexed iteration, e.g. {for $i in range(0, 10)}...{/for}
The reason Soy has two different loops is an accident of history; Soy didn’t use
to have a proper grammar for expressions and so the alternate ‘for...range’
syntax was added to make it possible to write indexed loops. As the grammar has
improved having the two syntaxes is no longer necessary and so we are
eliminating one of them.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=182843207