Add preliminary spec11 monthly pipeline

This adds the scaffolding for a basic Spec11 pipeline- it gathers all domains from all time for a given project and counts how many there are. I've factored out a few common utilities for beam pipelines to avoid excessive duplication.

Future CLs will:
- Actually process domains via the SafeBrowsing API
- Generate a real spec11 report
- Template queries based on the input YearMonth
- Abstract more commonalities across beam pipelines to reduce boilerplate when adding new pipelines.

TESTED: FOSS test passed, and ran successfully on alpha

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205997741
This commit is contained in:
larryruili 2018-07-25 08:34:58 -07:00 committed by jianglai
parent ded40851d3
commit d199b383e5
14 changed files with 252 additions and 38 deletions

View file

@ -67,8 +67,8 @@ public class InvoicingPipeline implements Serializable {
String invoiceTemplateUrl;
@Inject
@Config("invoiceStagingUrl")
String invoiceStagingUrl;
@Config("beamStagingUrl")
String beamStagingUrl;
@Inject
@Config("billingBucketUrl")
@ -99,7 +99,7 @@ public class InvoicingPipeline implements Serializable {
options.setRunner(DataflowRunner.class);
// This causes p.run() to stage the pipeline as a template on GCS, as opposed to running it.
options.setTemplateLocation(invoiceTemplateUrl);
options.setStagingLocation(invoiceStagingUrl);
options.setStagingLocation(beamStagingUrl);
Pipeline p = Pipeline.create(options);
PCollection<BillingEvent> billingEvents =