mirror of
https://github.com/cisagov/manage.get.gov.git
synced 2025-08-01 15:34:53 +02:00
Merge remote-tracking branch 'origin/main' into nl/1064-Change-domain-application-to-domain-requests
This commit is contained in:
commit
4847e66927
2 changed files with 51 additions and 18 deletions
30
docs/operations/runbooks/downtime_incident_management.md
Normal file
30
docs/operations/runbooks/downtime_incident_management.md
Normal file
|
@ -0,0 +1,30 @@
|
|||
# Downtime Incident Management Runbook
|
||||
|
||||
Our team has agreed upon steps for handling incidents that cause our site to go offline or become unusable for users. For this document, an incident refers to one in which manage.get.gov is offline or displaying error 400/500 HTTP errors on all pages. However, for this document to apply the cause of the problem must be a critical bug in our code or one of our providers having an outage, not to be confused with a cyber security incident. This document should not be used in response to any type of cyber security incident.
|
||||
|
||||
## Response management rules
|
||||
|
||||
The following set of rules should be followed while an incident is in progress.
|
||||
|
||||
- The person who first notices that the site is down is responsible for using @here and notifying in #dotgov-announce that production is down.
|
||||
- This applies to any team member, including new team members and non-developers.
|
||||
- If no engineer has acknowledged the announcement within 10 minutes, whoever discovered the site was down should call each developer via the Slack DM huddle feature. If there is no response, this should escalate to a phone call.
|
||||
- When calling, go down the [phone call list](https://docs.google.com/document/d/1k4r-1MNCfW8EXSXa-tqJQzOvJxQv0ARvHnOjjAH0LII/edit) from top to bottom until someone answers who is available to help.
|
||||
- If this incident occurs outside of regular working hours, choosing to help is on a volunteer basis, and answering a call doesn't mean an individual is truly available to assist.
|
||||
- Once an engineer is online, they should immediately start a huddle in the #dotgov-redalert channel to begin troubleshooting.
|
||||
- All available engineers should join the huddle once they see it.
|
||||
- If downtime occurs outside of working hours, team members who are off for the day may still be pinged and called but are not required to join if unavailable to do so.
|
||||
- Uncomment the [banner on get.gov](https://github.com/cisagov/get.gov/blob/0365d3d34b041cc9353497b2b5f81b6ab7fe75a9/_includes/header.html#L9), so it is transparent to users that we know about the issue on manage.get.gov.
|
||||
- Designers or Developers should be able to make this change; if designers are online and can help with this task, that will allow developers to focus on fixing the bug.
|
||||
|
||||
## Post Incident
|
||||
|
||||
The following checklist should be followed after the site is back up and running.
|
||||
|
||||
- [ ] Message in #dotgov-announce with an @here saying the issue is resolved
|
||||
- [ ] Remove the [banner on get.gov](https://github.com/cisagov/get.gov/blob/0365d3d34b041cc9353497b2b5f81b6ab7fe75a9/_includes/header.html#L9) by commenting it out.
|
||||
- [ ] Write up what happened and when; if the cause is already known, write that as well. This is a draft for internal communications and not for any public facing site and can be as simple as using bullet points.
|
||||
- [ ] If the cause is not known yet, developers should investigate the issue as the highest priority task.
|
||||
- [ ] As close to the event as possible, such as the next day, perform a team incident retro that is an hour long. The goal of this meeting should be to inform all team members what happened and what is being done now and to collect feedback on what could have been done better. This is where the draft write up of what happened will be useful.
|
||||
- [ ] After the retro and once the bug is fully identified, an engineer should assist in writing an incident report and may be as detailed as possible for future team members to refer to. That document should be places in the [Incidents folder](https://drive.google.com/drive/folders/1LPVICVpI4Xb5KGdrNkSwhX2OAJ6hYTyu).
|
||||
- [ ] After creating the document above, the lead engineer make a draft of content that will go in the get.gov Incidents section. This Word document should be shared and reviewed by the product team before a developer adds it to get.gov.
|
|
@ -26,6 +26,7 @@ def write_header(writer, columns):
|
|||
def get_domain_infos(filter_condition, sort_fields):
|
||||
domain_infos = (
|
||||
DomainInformation.objects.select_related("domain", "authorizing_official")
|
||||
.prefetch_related("domain__permissions")
|
||||
.filter(**filter_condition)
|
||||
.order_by(*sort_fields)
|
||||
)
|
||||
|
@ -49,6 +50,7 @@ def parse_row(columns, domain_info: DomainInformation, security_emails_dict=None
|
|||
|
||||
# Domain should never be none when parsing this information
|
||||
if domain_info.domain is None:
|
||||
logger.error("Attemting to parse row for csv exports but Domain is none in a DomainInfo")
|
||||
raise ValueError("Domain is none")
|
||||
|
||||
domain = domain_info.domain # type: ignore
|
||||
|
@ -127,15 +129,6 @@ def _get_security_emails(sec_contact_ids):
|
|||
return security_emails_dict
|
||||
|
||||
|
||||
def update_columns_with_domain_managers(columns, max_dm_count):
|
||||
"""
|
||||
Update the columns list to include "Domain manager email {#}" headers
|
||||
based on the maximum domain manager count.
|
||||
"""
|
||||
for i in range(1, max_dm_count + 1):
|
||||
columns.append(f"Domain manager email {i}")
|
||||
|
||||
|
||||
def write_csv(
|
||||
writer,
|
||||
columns,
|
||||
|
@ -161,19 +154,26 @@ def write_csv(
|
|||
# Reduce the memory overhead when performing the write operation
|
||||
paginator = Paginator(all_domain_infos, 1000)
|
||||
|
||||
if get_domain_managers and len(all_domain_infos) > 0:
|
||||
# We want to get the max amont of domain managers an
|
||||
# account has to set the column header dynamically
|
||||
max_dm_count = max(len(domain_info.domain.permissions.all()) for domain_info in all_domain_infos)
|
||||
update_columns_with_domain_managers(columns, max_dm_count)
|
||||
|
||||
if should_write_header:
|
||||
write_header(writer, columns)
|
||||
# The maximum amount of domain managers an account has
|
||||
# We get the max so we can set the column header accurately
|
||||
max_dm_count = 0
|
||||
total_body_rows = []
|
||||
|
||||
for page_num in paginator.page_range:
|
||||
rows = []
|
||||
page = paginator.page(page_num)
|
||||
for domain_info in page.object_list:
|
||||
|
||||
# Get count of all the domain managers for an account
|
||||
if get_domain_managers:
|
||||
dm_count = domain_info.domain.permissions.count()
|
||||
if dm_count > max_dm_count:
|
||||
max_dm_count = dm_count
|
||||
for i in range(1, max_dm_count + 1):
|
||||
column_name = f"Domain manager email {i}"
|
||||
if column_name not in columns:
|
||||
columns.append(column_name)
|
||||
|
||||
try:
|
||||
row = parse_row(columns, domain_info, security_emails_dict, get_domain_managers)
|
||||
rows.append(row)
|
||||
|
@ -182,8 +182,11 @@ def write_csv(
|
|||
# It indicates that DomainInformation.domain is None.
|
||||
logger.error("csv_export -> Error when parsing row, domain was None")
|
||||
continue
|
||||
total_body_rows.extend(rows)
|
||||
|
||||
writer.writerows(rows)
|
||||
if should_write_header:
|
||||
write_header(writer, columns)
|
||||
writer.writerows(total_body_rows)
|
||||
|
||||
|
||||
def export_data_type_to_csv(csv_file):
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue