Update input/output of Spec11 pipeline to final format

This changes the BigQuery input to the fields we ultimately want (fqdn,
registrarName, registrarEmailAddress) and the output to a structured POJO
holding the results from the API. This POJO is then converted to its final text output, i.e.:

Map from registrar e-mail to list of threat-detected subdomains:
{"registrarEmail": "c@fake.com", "threats": [{"url": "a.com", "threatType": "MALWARE"}]}
{"registrarEmail": "d@fake.com", "threats": [{"url": "x.com", "threatType": "MALWARE"}, {"url": "y.com", "threatType": "MALWARE"}]}

This gives us all the data we want in a JSON structured format, to be acted upon downstream by the to-be-constructed PublishSpec11ReportAction. Ideally, we would send an e-mail directly from the beam pipeline, but this is only possible through third-party providers (as opposed to app engine itself).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=209416880
This commit is contained in:
larryruili 2018-08-20 07:54:31 -07:00 committed by jianglai
parent 7dcadaecf6
commit f7bc17fbe8
11 changed files with 393 additions and 130 deletions

View file

@ -17,6 +17,8 @@ package google.registry.beam;
import com.google.common.base.Joiner;
import com.google.common.collect.ImmutableList;
import com.google.common.flogger.FluentLogger;
import com.google.common.io.Resources;
import google.registry.util.ResourceUtils;
import org.apache.avro.generic.GenericRecord;
import org.apache.beam.sdk.io.gcp.bigquery.SchemaAndRecord;
@ -54,4 +56,12 @@ public class BeamUtils {
missingFieldList, record));
}
}
/**
* Returns the {@link String} contents for a file in the {@code sql/} directory relative to a
* class.
*/
public static String getQueryFromFile(Class<?> clazz, String filename) {
return ResourceUtils.readResourceUtf8(Resources.getResource(clazz, "sql/" + filename));
}
}