We limit the maximum number of concurrent connections that a client can make the proxy. The quota is implemented as a (thread-safe) map of client certificate hash to available number of connections. When a new connection is made, we decrement the availability counter by one. When the counter hits zero, no more connections can be made and any new connection from the same client is terminated by the proxy.
Currently, the counter is incremented when a connection is terminated, including connections that are terminated *because* the quota is reached (i. e. the connections for which the counter is not decremented because the counter is already zero). This means that the first time the quota is reached, the next connection is dropped, the counter is incremented to 1 and new connections can be made again, bypassing the quota. This process can be repeated to achieve, theoretically, infinite quota.
This CL fixes this bug by only incrementing the counter, upon connection termination, for connections that have decremented the counter in the first place.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=217231593
1. Moved code for the GCP proxy to where the [] proxy code used to live.
3. Corrected reference to the GCP proxy location.
4. Misc changes to make ErrorProne and various tools happy.
+diekmann to LGTM terraform whitelist change.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=213630560
Got rid of the ugly use of locks and consolidate synchronization between I/O thread and test thread using count down latches. I believe this makes the code much cleaner and easy to reason about.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208739770
[1] Web whois should redirect to www.registry.google. whois.registry.google also points to the proxy IP, so redirecting to whois.registry.google just makes it loop. Also allow HEAD in web whois request in case that is used in monitoring.
[2] Separately, there's a bug introduced in [] where exception handling of inbound messages is moved to HttpsRelayServiceHandler. However the quota handlers are installed behind the HttpServiceServiceHandler in the channel pipeline, therefore the exception thrown in quota handlers never got processed. This results in hung connection when quota exceeded.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208651011
Tweaked a few logging levels to not spam error level logs. Also make it easy to debug issues in case relay retry fails.
[1] Put non-fatal exceptions that should be logged at warning in their explicit sets. Also always use the root cause to determine if an exception is non-fatal, because sometimes the actual causes are wrapped inside other exceptions.
[2] Record the cause of a relay failure, and record if a relay retry is successful. This way we can look at the log and figure out if a relay is eventually successful.
[3] Add a log when the frontend connection from the client is terminated.
[4] Alway close the relay channel when a relay has failed, which, depend on if the channel is frontend or backend, will reconnect and trigger a retry.
[5] Lastly changed failure test to use assertThrows instead of fail.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208649916
Previously the ssl initializer tests always uses JDK, which is not really testing what happens in production when we take advantage of the OpenSSL provider. Now the tests will run with all providers that are available (through JUnit parameterization). Some bugs that may cause flakiness are fixed in the process.
Change how SNI is verified in tests. It turns out that the old method (only verifying the SSL parameters in the SSL engine) does not actually ensure that the SNI address is sent to the peer, but only that the SSL engine is configured to send it (this value exists even before a handshake is performed). Also there's likely a bug in Netty's SSL engine that does not set this parameter when created with a peer host.
Lastly HTTP test utils are changed so that they do not use pre-defined constants for header names and values. We want the test to confirm that these constants are what we expect they are. Using string literals makes these tests also more explicit.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207930282
The connection to GAE is not persistent and can drop. Reconnect when that happens, as long as the connection from the client is still active.
We need to consider the fact that while a reconnection is happening, the client may be sending requests that was relayed to the old connection, which is not going through. In that case these requests are queued and will be retried when the new connection is available.
Since we are no longer tying the lifecycles of the two connections, we cannot automatically terminate one when another is terminated. Also we need to explicitly control how WHOIS connection is terminated, not depending on the HTTP connection header.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207335498
We are seeing some web WHOIS HTTP(S) requests made to our endpoints without the Host header specified. This is an error according to the HTTP/1.1 spec. However we do not want to spam our logs with errors that are outside of our control. Do not throw and return a 400 response instead.
Also re-worked the logic a bit to only return HSTS headers if we send a redirect response, not any other error responses. The tests are re-arrange to correspond with the logical flow in the code.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207143230
The server certificates and corresponding keys are encrypted by KMS and stored on GCS. This allows us to easily replace expiring certs without having to roll out a new proxy release. However currently the certificate is obtained as a singleton and used in all connections served by a proxy instance. This means that if we were to upload a new cert, all existing instances will not use it.
This CL makes it so that we only cache the certificate for 30 min, after which a new cert is fetched and decrypted. Local certificates used for testing are still singletons.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206976318
Opened two ports (30010 and 30011 by default) that handles HTTP(S) GET requests. the HTTP request is redirected to the corresponding HTTPS site, whereas the HTTPS request is redirected to a site that supports web WHOIS.
The GCLB currently exposes port 80, but not port 443 on its TCP proxy load balancer (see https://cloud.google.com/load-balancing/docs/choosing-load-balancer). As a result, the HTTP traffic has to be routed by the HTTP load balancer, which requires a separate HTTP health check (as opposed to the TCP health check that the TCP proxy LB uses). This CL also added support for HTTP health check.
There is not a strong case for adding an end-to-end test for WebWhoisProtocolsModule (like those for EppProtocolModule, etc) as it just assembles standard HTTP codecs used for an HTTP server, plus the WebWhoisRedirectHandler, which is tested. The end-to-end test would just be testing if the Netty provided HTTP handlers correctly parse raw HTTP messages.
Sever other small improvement is also included:
[1] Use setInt other than set when setting content length in HTTP headers. I don't think it is necessary, but it is nevertheless a better practice to use a more specialized setter.
[2] Do not write metrics when running locally.
[3] Rename the qualifier @EppCertificates to @ServerSertificate as it now provides the certificate used in HTTPS traffic as well.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206944843
If the proxy protocol header contains a malformatted string, such as "PROXY UNKNOWN", instead of throwing and killing the connection, use the TCP source IP as the remote IP.
Also changed how the header is read from the buffer, to avoid a potential Netty resource leak. Originally the header is read into another ByteBuf, which needs be be explicit released in order for Netty to reclaim its memory (http://netty.io/wiki/reference-counted-objects.html). Now we just read it into a byte array and let JVM GC it.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=188047084
A recent change in Netty 4.1.21 (978a46cc0a) tried to fix an issue where channels might be closed before any handshake exception can be propagated. This however introduced a regression where the the connection is not closed at all after a handshake failure, which caused test failures because we were expecting the connection to be closed after a handshake failure.
We rolled back dependency on Netty 4.1.21 so that the test would pass. A fix upstream is schedule for 4.1.22 (https://github.com/netty/netty/pull/7727).
However this does reveal some potential problem in our tests. Namely we did not wait for the connection to be closed before assertion on it. The old Netty behavior closes the connection before handshake exception is thrown, and we *do* wait for the handshake exception. The connection assertion happens after the handshake exception is verified, so by then the connection is always closed.
When the upstream fix is released, we'd run into concurrency problem described above. So we instead wait for the connection to be closed before checking handshake exception (by releasing the lock in a channel close listener), which guarantees that when we check the connection, it is always closed.
Also fixes some javadoc errors.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186021997
When a quota request is rejected, increment the metric counter by one.
Also makes both frontend and backend metrics singleton because all the fields they have a static.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185146804
The quota handler terminates connections when quota is exceeded.
The next CL will add instrumentation for quota related metrics.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185042675
Dagger updated to 2.13, along with all its dependencies.
Also allows us to have multiple config files for different environment (prod, sandbox, alpha, local, etc) and specify which one to use on the command line with a --env flag. Therefore the same binary can be used in all environments.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=176551289