Commit graph

2 commits

Author SHA1 Message Date
larryruili
f7bc17fbe8 Update input/output of Spec11 pipeline to final format
This changes the BigQuery input to the fields we ultimately want (fqdn,
registrarName, registrarEmailAddress) and the output to a structured POJO
holding the results from the API. This POJO is then converted to its final text output, i.e.:

Map from registrar e-mail to list of threat-detected subdomains:
{"registrarEmail": "c@fake.com", "threats": [{"url": "a.com", "threatType": "MALWARE"}]}
{"registrarEmail": "d@fake.com", "threats": [{"url": "x.com", "threatType": "MALWARE"}, {"url": "y.com", "threatType": "MALWARE"}]}

This gives us all the data we want in a JSON structured format, to be acted upon downstream by the to-be-constructed PublishSpec11ReportAction. Ideally, we would send an e-mail directly from the beam pipeline, but this is only possible through third-party providers (as opposed to app engine itself).

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=209416880
2018-08-20 14:26:46 -04:00
larryruili
d199b383e5 Add preliminary spec11 monthly pipeline
This adds the scaffolding for a basic Spec11 pipeline- it gathers all domains from all time for a given project and counts how many there are. I've factored out a few common utilities for beam pipelines to avoid excessive duplication.

Future CLs will:
- Actually process domains via the SafeBrowsing API
- Generate a real spec11 report
- Template queries based on the input YearMonth
- Abstract more commonalities across beam pipelines to reduce boilerplate when adding new pipelines.

TESTED: FOSS test passed, and ran successfully on alpha

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=205997741
2018-08-10 13:44:25 -04:00