As part of b/36599833, this makes FlowReporter log the tld(s) of every domain
flow it executes, so we can provide ICANN reporting totals on a per-TLD basis.
It also adds several other fields that we're computing anyway and which seem
useful, particularly for debugging any issues we see in production with the data
that we're attempting to record for ICANN reporting. The full set of fields is:
- commandType (e.g. "create", "info", "transfer")
- resourceType* (e.g. "domain", "contact", "host")
- flowClassName (e.g. "ContactCreateFlow", "DomainRestoreRequestFlow")
- targetId* (e.g. "ns1.foo.com", "bar.org", "contact-1234")
- targetIds* - plural of the above, for multi-resource checks
- tld** (e.g. "com", "co.uk") - extracted from targetId, lowercased
- tlds** - plural of the above, deduplicated, for multi-resource checks
* = only non-empty for resource flows (not e.g. login, logout, poll)
** = only non-empty for domain flows
Note that TLD extraction is deliberately very lenient to avoid the complexity
overhead of double-validation of the domain names in the common case.
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=154070794
This prevents a possible failure mode of the logging where the logged
EPP input XML is very large (which can happen e.g. for domain creates
with large SMD values). In those cases, the XML might cause the overall
JSON string to be too large to fit within a single log entry [1], in which
case it gets split over multiple lines and breaks automatic parsing.
This mitigates that case by logging the EPP input (raw and base64-encoded)
in a separate log statement so that the more compact metadata (like clientId)
and derived values (like ICANN reporting field) will still be in an intact
JSON string even in that case, and can still be readily parsed. It's okay
if the actual EPP XML is harder to parse, since once we're logging the right
metadata fields we shouldn't need to automatically parse the EPP XML in any
normal cases.
[1] I haven't found this exact limit or splitting algorithm, or whether it's
a property of java logging or GAE log ingestion. The GAE logs page does note
that a single application log entry (within a request, which can have up to
1000 such entries) maxes out at 8KB, so that might be it:
https://cloud.google.com/appengine/docs/standard/java/logs/#writing_application_logs
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=153771335
Since this reporting is getting more complicated (see b/36599833), it'll
be better to have a dedicated class to encapsulate it, which also lets us
keep the tests separate and focus FlowRunner more on its core purpose of
actually running the flow.
Note that this doesn't move the legacy log statement logging because that
specifically must be logged from the FlowRunner.run() method to preserve
the existing log signature matching in our ICANN activity reporting query.
(The new statement is designed to be robust to moves like this since it
doesn't use the logging callsite to match log lines, and it's not in use
yet anyway.)
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=153762008