Commit graph

24 commits

Author SHA1 Message Date
jianglai
3cfde5d4a1 Fix EPP quota handling bug
We limit the maximum number of concurrent connections that a client can make the proxy. The quota is implemented as a (thread-safe) map of client certificate hash to available number of connections. When a new connection is made, we decrement the availability counter by one. When the counter hits zero, no more connections can be made and any new connection from the same client is terminated by the proxy.

Currently, the counter is incremented when a connection is terminated, including connections that are terminated *because* the quota is reached (i. e. the connections for which the counter is not decremented because the counter is already zero). This means that the first time the quota is reached, the next connection is dropped, the counter is incremented to 1 and new connections can be made again, bypassing the quota. This process can be repeated to achieve, theoretically, infinite quota.

This CL fixes this bug by only incrementing the counter, upon connection termination, for connections that have decremented the counter in the first place.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=217231593
2018-10-17 11:56:04 -04:00
jianglai
3fc7271145 Move GCP proxy code to the old [] proxy's location
1. Moved code for the GCP proxy to where the [] proxy code used to live.
3. Corrected reference to the GCP proxy location.
4. Misc changes to make ErrorProne and various tools happy.

+diekmann to LGTM terraform whitelist change.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=213630560
2018-09-20 11:19:36 -04:00
mcilwain
a483beef28 Add MOE equivalence for 2018-09-14 sync
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=212996616
2018-09-20 11:19:36 -04:00
Liam Miller-Cushon
3b62a51424 Remove a reference to sun.security.provider.certpath
sun.security.provider.certpath.SunCertPathBuilderException is not visible in JDK 9,
see: bazelbuild/bazel#6154

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=213068821
2018-09-14 21:44:55 -04:00
guyben
b588c57526 Create a NettyRule
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=212806539
2018-09-14 11:52:13 -04:00
jianglai
2e4e542205 Improve SSL initializer tests
Got rid of the ugly use of locks and consolidate synchronization between I/O thread and test thread using count down latches. I believe this makes the code much cleaner and easy to reason about.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208739770
2018-08-20 14:04:50 -04:00
jianglai
2e2898e17c Fix WHOIS issues
[1] Web whois should redirect to www.registry.google. whois.registry.google also points to the proxy IP, so redirecting to whois.registry.google just makes it loop. Also allow HEAD in web whois request in case that is used in monitoring.

[2] Separately, there's a bug introduced in [] where exception handling of inbound messages is moved to HttpsRelayServiceHandler. However the quota handlers are installed behind the HttpServiceServiceHandler in the channel pipeline, therefore the exception thrown in quota handlers never got processed. This results in hung connection when quota exceeded.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208651011
2018-08-20 14:00:08 -04:00
jianglai
0e64015cdf Improve logs in the GCP proxy
Tweaked a few logging levels to not spam error level logs. Also make it easy to debug issues in case relay retry fails.

[1] Put non-fatal exceptions that should be logged at warning in their explicit sets. Also always use the root cause to determine if an exception is non-fatal, because sometimes the actual causes are wrapped inside other exceptions.

[2] Record the cause of a relay failure, and record if a relay retry is successful. This way we can look at the log and figure out if a relay is eventually successful.

[3] Add a log when the frontend connection from the client is terminated.

[4] Alway close the relay channel when a relay has failed, which, depend on if the channel is frontend or backend, will reconnect and trigger a retry.

[5] Lastly changed failure test to use assertThrows instead of fail.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208649916
2018-08-20 13:58:30 -04:00
jianglai
9eec70729f Refine tests in GCP proxy
Previously the ssl initializer tests always uses JDK, which is not really testing what happens in production when we take advantage of the OpenSSL provider. Now the tests will run with all providers that are available (through JUnit parameterization). Some bugs that may cause flakiness are fixed in the process.

Change how SNI is verified in tests. It turns out that the old method (only verifying the SSL parameters in the SSL engine) does not actually ensure that the SNI address is sent to the peer, but only that the SSL engine is configured to send it (this value exists even before a handshake is performed). Also there's likely a bug in Netty's SSL engine that does not set this parameter when created with a peer host.

Lastly HTTP test utils are changed so that they do not use pre-defined constants for header names and values. We want the test to confirm that these constants are what we expect they are. Using string literals makes these tests also more explicit.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207930282
2018-08-10 13:46:48 -04:00
jianglai
4ff77fb370 Automatic reconnect to GAE when the connection is dropped
The connection to GAE is not persistent and can drop. Reconnect when that happens, as long as the connection from the client is still active.

We need to consider the fact that while a reconnection is happening, the client may be sending requests that was relayed to the old connection, which is not going through. In that case these requests are queued and will be retried when the new connection is available.

Since we are no longer tying the lifecycles of the two connections, we cannot automatically terminate one when another is terminated. Also we need to explicitly control how WHOIS connection is terminated, not depending on the HTTP connection header.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207335498
2018-08-10 13:46:48 -04:00
jianglai
8664101687 Make web WHOIS more resilient to malformed requests
We are seeing some web WHOIS HTTP(S) requests made to our endpoints without the Host header specified. This is an error according to the HTTP/1.1 spec. However we do not want to spam our logs with errors that are outside of our control. Do not throw and return a 400 response instead.

Also re-worked the logic a bit to only return HSTS headers if we send a redirect response, not any other error responses. The tests are re-arrange to correspond with the logical flow in the code.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=207143230
2018-08-10 13:46:48 -04:00
jianglai
628aacd754 Cache server certificates for up to 30 min
The server certificates and corresponding keys are encrypted by KMS and stored on GCS. This allows us to easily replace expiring certs without having to roll out a new proxy release. However currently the certificate is obtained as a singleton and used in all connections served by a proxy instance. This means that if we were to upload a new cert, all existing instances will not use it.

This CL makes it so that we only cache the certificate for 30 min, after which a new cert is fetched and decrypted. Local certificates used for testing are still singletons.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206976318
2018-08-10 13:46:48 -04:00
jianglai
4a5b317016 Add web WHOIS redirect support
Opened two ports (30010 and 30011 by default) that handles HTTP(S) GET requests. the HTTP request is redirected to the corresponding HTTPS site, whereas the HTTPS request is redirected to a site that supports web WHOIS.

The GCLB currently exposes port 80, but not port 443 on its TCP proxy load balancer (see https://cloud.google.com/load-balancing/docs/choosing-load-balancer). As a result, the HTTP traffic has to be routed by the HTTP load balancer, which requires a separate HTTP health check (as opposed to the TCP health check that the TCP proxy LB uses). This CL also added support for HTTP health check.

There is not a strong case for adding an end-to-end test for WebWhoisProtocolsModule (like those for EppProtocolModule, etc) as it just assembles standard HTTP codecs used for an HTTP server, plus the WebWhoisRedirectHandler, which is tested. The end-to-end test would just be testing if the Netty provided HTTP handlers correctly parse raw HTTP messages.

Sever other small improvement is also included:

[1] Use setInt other than set when setting content length in HTTP headers. I don't think it is necessary, but it is nevertheless a better practice to use a more specialized setter.
[2] Do not write metrics when running locally.
[3] Rename the qualifier @EppCertificates to @ServerSertificate as it now provides the certificate used in HTTPS traffic as well.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=206944843
2018-08-10 13:46:48 -04:00
jianglai
8f5be6e7a8 Make some minor changes to logging messages and test names.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205464581
2018-08-10 13:44:25 -04:00
jianglai
0cb303ed7f Fix proxy metrics instrumentation bug
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=197209531
2018-05-30 12:18:54 -04:00
jianglai
c72e01f75e Clean up some code quality issues in GCP proxy
All changes are suggested by IntelliJ code inspection.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=189586104
2018-03-19 18:44:12 -04:00
jianglai
00bf8a999f Handle malformed proxy protocol header
If the proxy protocol header contains a malformatted string, such as "PROXY UNKNOWN", instead of throwing and killing the connection, use the TCP source IP as the remote IP.

Also changed how the header is read from the buffer, to avoid a potential Netty resource leak. Originally the header is read into another ByteBuf, which needs be be explicit released in order for Netty to reclaim its memory (http://netty.io/wiki/reference-counted-objects.html). Now we just read it into a byte array and let JVM GC it.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=188047084
2018-03-06 19:26:31 -05:00
cushon
606b470cd0 Merge JUnitBackport's expectThrows into assertThrows
More information: https://github.com/junit-team/junit5/issues/531

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=187034408
2018-03-06 18:56:15 -05:00
jianglai
1013e047b4 Make SSL failure test more robust
A recent change in Netty 4.1.21 (978a46cc0a) tried to fix an issue where channels might be closed before any handshake exception can be propagated. This however introduced a regression where the the connection is not closed at all after a handshake failure, which caused test failures because we were expecting the connection to be closed after a handshake failure.

We rolled back dependency on Netty 4.1.21 so that the test would pass. A fix upstream is schedule for 4.1.22 (https://github.com/netty/netty/pull/7727).

However this does reveal some potential problem in our tests. Namely we did not wait for the connection to be closed before assertion on it. The old Netty behavior closes the connection before handshake exception is thrown, and we *do* wait for the handshake exception. The connection assertion happens after the handshake exception is verified, so by then the connection is always closed.

When the upstream fix is released, we'd run into concurrency problem described above. So we instead wait for the connection to be closed before checking handshake exception (by releasing the lock in a channel close listener), which guarantees that when we check the connection, it is always closed.

Also fixes some javadoc errors.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=186021997
2018-02-20 15:59:06 -05:00
mcilwain
15f871a605 Temporarily disable channel alive assertions in SSL fai...
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185990383
2018-02-20 15:52:47 -05:00
jianglai
738994b6f6 Temporarily disable channel alive assertions in SSL failure tests
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185868539
2018-02-20 15:49:47 -05:00
jianglai
ce5baafc4a Register quota metrics in GCP proxy
When a quota request is rejected, increment the metric counter by one.

Also makes both frontend and backend metrics singleton because all the fields they have a static.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185146804
2018-02-20 15:39:15 -05:00
jianglai
6ca523386a Add QuotaHandler to GCP proxy
The quota handler terminates connections when quota is exceeded.

The next CL will add instrumentation for quota related metrics.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=185042675
2018-02-20 15:36:23 -05:00
jianglai
7e42ee48a4 Open source GCP proxy
Dagger updated to 2.13, along with all its dependencies.

Also allows us to have multiple config files for different environment (prod, sandbox, alpha, local, etc) and specify which one to use on the command line with a --env flag. Therefore the same binary can be used in all environments.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=176551289
2017-11-21 19:19:03 -05:00