The objects stored in the relay buffer may leak memory when they are no longer used. Alway remember to release their reference count in all cases.
Also save the relay channel and its name in BackendMetricsHandler when the handler is registered. This is because when retrying a relay, the write is sent as soon as the channel is connected, and the channelActive function is not called yet.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208757730
It turns out in the edge case where a write occurs at the same moment that the
relay connection is terminated, the current retry mechanism is not sufficient
because it stores reference coutned objects whose internal buffers are already
freed.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208738065
[1] Web whois should redirect to www.registry.google. whois.registry.google also points to the proxy IP, so redirecting to whois.registry.google just makes it loop. Also allow HEAD in web whois request in case that is used in monitoring.
[2] Separately, there's a bug introduced in [] where exception handling of inbound messages is moved to HttpsRelayServiceHandler. However the quota handlers are installed behind the HttpServiceServiceHandler in the channel pipeline, therefore the exception thrown in quota handlers never got processed. This results in hung connection when quota exceeded.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208651011
Tweaked a few logging levels to not spam error level logs. Also make it easy to debug issues in case relay retry fails.
[1] Put non-fatal exceptions that should be logged at warning in their explicit sets. Also always use the root cause to determine if an exception is non-fatal, because sometimes the actual causes are wrapped inside other exceptions.
[2] Record the cause of a relay failure, and record if a relay retry is successful. This way we can look at the log and figure out if a relay is eventually successful.
[3] Add a log when the frontend connection from the client is terminated.
[4] Alway close the relay channel when a relay has failed, which, depend on if the channel is frontend or backend, will reconnect and trigger a retry.
[5] Lastly changed failure test to use assertThrows instead of fail.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208649916
We stopped updating the GCS bucket a while ago. The external repos should be sufficient.
Also added comment to explain dependency shadowing by closure rules.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208234650
16 is consistent with how we've generated codes for anchor tenants in the past.
Also gets rid of a space in the output so that it's a fully valid CSV.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208106631
This adds actual subdomain verification via the SafeBrowsing API to the Spec11
pipeline, as well as on-the-fly KMS decryption via the GenerateSpec11Action to
securely store our API key in source code.
Testing the interaction becomes difficult due to serialization requirements, and will be significantly expanded in the next cl. For now, it verifies basic end-to-end pipeline behavior.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208092942
The previous CL had a bug as non-200 response are outbound errors and are not caught in exceptionCaught() method.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208063877
This seems to fix the FOSS test timeout.
Also use the static-linked netty-tcnative library in tests to ensure that
OpenSSL provider is always available in tests. In production, we should use
the dynamic-linked version to reduce binary footprint and relay on system
OpenSSL library.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208057173
The "tar file encoding" saves the file + metadata (filename and modification) in a "tar" format that is required in the RDE spec, even though it only contains a single file.
This is only relevant for RyDE, and not for Ghostryde. In fact, the only reason Ghostryde exists is to not have the TAR layer.
Currently we only encrypt RyDE, so we only need the TAR encoding. We plan to add decryption ability so we can test files we sent to IronMountain if there's a problem - so we will need TAR decoding for that.
The new file - RydeTar.java - has both encoding and decoding. We keep the format used for all other Input/OutputStreams for consistency, even though in this case it could be a private part of the RyDE encoder / decoder.
This is one of a series of CLs - each merging a single "part" of the encoding.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208056757
Masks user credentials (tags 'pw' and 'newPW') in EPP XML messages.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207953894
Previously the ssl initializer tests always uses JDK, which is not really testing what happens in production when we take advantage of the OpenSSL provider. Now the tests will run with all providers that are available (through JUnit parameterization). Some bugs that may cause flakiness are fixed in the process.
Change how SNI is verified in tests. It turns out that the old method (only verifying the SSL parameters in the SSL engine) does not actually ensure that the SNI address is sent to the peer, but only that the SSL engine is configured to send it (this value exists even before a handshake is performed). Also there's likely a bug in Netty's SSL engine that does not set this parameter when created with a peer host.
Lastly HTTP test utils are changed so that they do not use pre-defined constants for header names and values. We want the test to confirm that these constants are what we expect they are. Using string literals makes these tests also more explicit.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207930282
The design doc is at []
The next step will be to tie this into the domain create flow, and if the domain
name is on a reserved list, allow it to be created if the token is specified that
has the given domain name on it.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207884521
Create a command to send arbitrary, authenticated HTTP requests to the backend
and remove the existing commands that are basically just wrappers around this.
Tested:
In addition to the unit tests, verified both get and post requests against
alpha.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207756509
[1] All logs should contain a reference to the channel so that it is easy to search for logs about a specific channel.
[2] EPP ssl handshake failure should be logged at warning. It is mostly the client that failed to complete the handshake, for example by sending bad cert, or not sending cert, or not using the correct SSL version. We should not lot it at error and spam the log.
[3] When the EPP response is not 200, we should not log at error because it means that the GAE app responded successfully. For example when datastore contention occurs, app engine responds with a non-200 status and logs at warning. The proxy should not at a higher level than app engine itself.
[4] Timeout is a non-fatal error that should be logged at warning.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207562299
The connection to GAE is not persistent and can drop. Reconnect when that happens, as long as the connection from the client is still active.
We need to consider the fact that while a reconnection is happening, the client may be sending requests that was relayed to the old connection, which is not going through. In that case these requests are queued and will be retried when the new connection is available.
Since we are no longer tying the lifecycles of the two connections, we cannot automatically terminate one when another is terminated. Also we need to explicitly control how WHOIS connection is terminated, not depending on the HTTP connection header.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207335498
All the pipeline-crashing problems should be fixed now, so we should have no
problem re-automating the invoice publish.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207265990
There's not much we can do when the user sends incorrect HTTP requests or cannot finish SSL handshake (the problematic requests are likely from bots anyway). Reducing the log level to warning in order to reduce spamming.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207159118
We are seeing some web WHOIS HTTP(S) requests made to our endpoints without the Host header specified. This is an error according to the HTTP/1.1 spec. However we do not want to spam our logs with errors that are outside of our control. Do not throw and return a 400 response instead.
Also re-worked the logic a bit to only return HSTS headers if we send a redirect response, not any other error responses. The tests are re-arrange to correspond with the logical flow in the code.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207143230
The server certificates and corresponding keys are encrypted by KMS and stored on GCS. This allows us to easily replace expiring certs without having to roll out a new proxy release. However currently the certificate is obtained as a singleton and used in all connections served by a proxy instance. This means that if we were to upload a new cert, all existing instances will not use it.
This CL makes it so that we only cache the certificate for 30 min, after which a new cert is fetched and decrypted. Local certificates used for testing are still singletons.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206976318
*** Reason for rollback ***
It's still having the same issues from b/79463634 in sandbox, so we don't want to deploy it to prod.
*** Original change description ***
Switch pubapi/default service to basic scaling in prod/sandbox
Also goes back up to 100 max instances.
Hopefully this'll work better this time.
***
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206975159
Opened two ports (30010 and 30011 by default) that handles HTTP(S) GET requests. the HTTP request is redirected to the corresponding HTTPS site, whereas the HTTPS request is redirected to a site that supports web WHOIS.
The GCLB currently exposes port 80, but not port 443 on its TCP proxy load balancer (see https://cloud.google.com/load-balancing/docs/choosing-load-balancer). As a result, the HTTP traffic has to be routed by the HTTP load balancer, which requires a separate HTTP health check (as opposed to the TCP health check that the TCP proxy LB uses). This CL also added support for HTTP health check.
There is not a strong case for adding an end-to-end test for WebWhoisProtocolsModule (like those for EppProtocolModule, etc) as it just assembles standard HTTP codecs used for an HTTP server, plus the WebWhoisRedirectHandler, which is tested. The end-to-end test would just be testing if the Netty provided HTTP handlers correctly parse raw HTTP messages.
Sever other small improvement is also included:
[1] Use setInt other than set when setting content length in HTTP headers. I don't think it is necessary, but it is nevertheless a better practice to use a more specialized setter.
[2] Do not write metrics when running locally.
[3] Rename the qualifier @EppCertificates to @ServerSertificate as it now provides the certificate used in HTTPS traffic as well.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206944843
Also adjusts the nomulus list_cursors command to output the value of this field.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206646117
This only used to have effect for C++ LIPO, which has been
removed from Blaze.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206617307
We need to support web WHOIS on the same IP addresses that we use for port 43 whois. [] added support for HTTP(S) traffic on the proxy, which simply redirects to another website that actually hosts the web WHOIS service. This cl sets up the GCLB to route port 80 and port 443 traffic to the proxy.
We were using the TCP proxy load balancer for other protocols that we support (EPP and WHOIS), but the TCP proxy LB only exposes port 443, not port 80. For port 443, we simply follow the same pattern and add another TCP proxy LB. For port 80, we had to use the HTTP LB which exposes port 80 (on the same external IP addresses). This requires a different HTTP health check and a URL map. The added URL map is a dummy one that routes all paths to the same backend service that supports HTTP redirect.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206409007
When versions are explicitly set to the latest available version, Annealing almost always fails to apply the patch due to yet-unknown reasons. The rationale for setting the versions explicitly was to ensure that the clusters are always updated in time. But it seems like it is not worth the trouble.
Without the explicit latest versions, the master should still be automatic upgrade (may not be immediate after version availability):
https://cloud.google.com/kubernetes-engine/versioning-and-upgrades#automatic_master_upgrades
We also set "Auto Upgrade" on the nodes, which should upgrades the nodes to master versions (may not be immediate after master version upgrade).
So it seems without these lines, we can still expect the gke versions of the cluster to upgrade (eventually).
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206408347
non-compliant packages that depend on java_x_proto_library targets. This will enable blaze
to enforce strict_deps by default while missing dependencies are added to these packages.
Changes made using newly released blaze flag:
USE_CANARY_BLAZE=nightly blaze build -k --experimental_java_proto_library_enforce_strict_deps
then extracting the packages from the resulting add_dep commands, and for each package running:
buildozer 'add features -jpl_strict_deps' <package>:__pkg__
More information: []
Tested:
TAP sample presubmit queue
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206349847
ModulesService does not provide a great API. Specifically, it doesn't have a
way to get the hostname for a specific service; you have to get the hostname for
a specific version as well. This is very rarely what we want, as we publish new
versions every week and don't expect old ones to hang around for very long, so
a task should execute against whatever the live version is, not whatever the
current version was back when the task was enqueued (especially because that
version might be deleted by now).
This new and improved wrapper API removes the confusion and plays better with
dependency injection to boot. We can also fold in other methods having to do
with App Engine services, whereas ModulesService was quite limited in scope.
This also has the side effect of fixing ResaveEntityAction, which is
currently broken because the tasks it's enqueuing to execute up to 30 days in
the future have the version hard-coded into the hostname, and we typically
delete old versions sooner than that.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206173763
It broke because I forgot to add the new spec11 packages to gtld.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206021827
This adds the scaffolding for a basic Spec11 pipeline- it gathers all domains from all time for a given project and counts how many there are. I've factored out a few common utilities for beam pipelines to avoid excessive duplication.
Future CLs will:
- Actually process domains via the SafeBrowsing API
- Generate a real spec11 report
- Template queries based on the input YearMonth
- Abstract more commonalities across beam pipelines to reduce boilerplate when adding new pipelines.
TESTED: FOSS test passed, and ran successfully on alpha
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205997741
This ensures that only one will run at a time, which should help fix the
clogged up mapreduces we've seen on sandbox.
In order to do this, the UnlockerOutput is introduced. This unlocks the
given Lock after all reducer shards have finished.
Also increases the lease duration of the DNS refresh action from 20 to
240 minutes. 20 minutes isn't long enough; when there's a lot of domains
and decent system load the mapreduce could take longer than that in the
ordinary case.
TESTED=Deployed to alpha and verified that more than one copy of the
mapreduce wouldn't run simultaneously, and also that the lock is
released when the mapreduce is finished.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205887554
The "file encoding" saves the file + metadata (filename and modification) in a "blob" format that PGP knows how to read.
Merges the file-encoder creation between RyDE and Ghostryde.
The new file - RydeFileEncoding.java - is a merge of the removed functions in
Ghostryde.java and the RydePgpFileOutputStream.java.
This is one of a series of CLs - each merging a single "part" of the encoding.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205295756
I'm finally fed up enough with all the nameserver changes we've had to make on our
self-allocated domains to improve the command. Now you can simply run:
$ nomulus ... update_domain ... -n ns[1-4].foo.bar
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205282317
Domains that are reserved with type NAME_COLLISION can be registered defensively
during sunrise only, but DNS can never resolve for them. Correspondingly, we
need to apply the SERVER_HOLD status for such registrations. We also send the
registrar a poll message informing them of this act.
This brings us up to feature parity with end-date sunrise (implemented in
DomainAllocateFlow), which already has all of this handling.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205277728
Merges the encryptor creation between RyDE and Ghostryde.
The new file - RydeEncryption.java - is a merge of the removed functions in
Ghostryde.java and the RydePgpEncryptionOutputStream.java.
This is one of a series of CLs - each merging a single "part" of the encoding.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205246053
Merges the compressor creation between RyDE and Ghostryde. Note that GhostRyde
will now compress with ZIP rather than the previous ZLIB. This is backwards
compatible because the decompression algorithm works with either, so files
created by the old version (with ZLIB) can still be opened by the new version,
and vice-versa.
The new file - RydeCompression.java - is a merge of the removed functions in Ghostryde.java and the RydePgpCompressionOutputStream.java.
This is one of a series of CLs - each merging a single "part" of the encoding.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205102150
Ghostryde.java has a lot of duplicate code with RydeEncoder and the future
RydeDecoder - the encryption/decryption, compression/decompression, file
encoding/decoding. The "de-XXX" part of each of these pairs needs to read a PGP
object from a stream using PGPObjectFactory.
Since we want to move the duplicate code into their own files, we will need to
move the "read PGP objects from stream" functions to a common utility class.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205092800
We used to set the installer as executable on X20 and have kokoro copy it to
the temp folder and run it from there. Now that executables on x20 must be
built verifiable, we cannot set the +x bit any more.
Instead, run the script as an argument to the bash command.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204536872
This prepares for the spec11 beam pipeline to live parallel to the invoicing
beam pipeline, for better organization.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204980582
Second step of RDE encoding refactoring.
Creates a single OutputStream encode RyDE files.
This replaces the 5 OutputStreams that were needed before.
Also removes all the factories that were injected. It's an encoding, there's no point in injecting it.
Finally, removed the buffer-size configuration and replaced with a static final
const value in each individual OutputStream.
This doesn't yet include a decoder (InputStream). And there's still a lot of overlap between the Ryde and the Ghostryde code. Both of those are left for the next CLs.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204898369
This officially adds a 15% discount to sunrise creates and makes anchor tenant
creates free for the first 2 years.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204805141
Also goes back up to 100 max instances.
Hopefully this'll work better this time.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204783809
Lists used as accumulators were being updated individually for each domain
without starting over from a fresh list each time, so the number of changes
would grow for each additional domain and potentially be wrong if the previous
domains were set up differently.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204526006
This means that, when writing new tests that are failing, you get much more
useful logs that show the actual XML in a more comprehensible format that is
suitable for pasting back into the golden file in the test (if the change was
intended).
This requires outputting the standalone parameter in the XML transformer, and
some minor changes to some tests as a result that were relying on it being
stripped out.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204513690
This also introduces a production canary environment, similar to sandbox canary. The docker tags are changed to "live" and "sandbox" respectively, to reflect the fact that different images may be used for prod and sandbox.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=204343530