An automated rollback tool for Nomulus (#847)

* An automated rollback tool for Nomulus

A tool that directs traffic between deployed versions. It handles the
conversion between Nomulus tags and AppEngine versions, executes schema
compatibility tests, ensures that steps are executed in the correct order,
and updates deployment records appropriately.
This commit is contained in:
Weimin Yu 2020-10-29 10:37:20 -04:00 committed by GitHub
parent 478064f32b
commit db2e896d42
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 1552 additions and 0 deletions

151
release/rollback/README.md Normal file
View file

@ -0,0 +1,151 @@
## Summary
This package contains an automated rollback tool for the Nomulus server on
AppEngine. When given the Nomulus tag of a deployed release, the tool directs
all traffics in the four recognized services (backend, default, pubapi, and
tools) to that release. In the process, it handles Nomulus tag to AppEngine
version ID translation, checks the target binary's compatibility with SQL
schema, starts/stops versions and redirects traffic in proper sequence, and
updates deployment metadata appropriately.
The tool has two limitations:
1. This tool only accepts one release tag as rollback target, which is applied
to all services.
2. The tool immediately migrates all traffic to the new versions. It does not
support gradual migration. This is not an issue now since gradual migration
is only available in automatically scaled versions, while none of versions
is using automatic scaling.
Although this tool is named a rollback tool, it can also reverse a rollback,
that is, rolling forward to a newer release.
## Prerequisites
This tool requires python version 3.7+. It also requires two GCP client
libraries: google-cloud-storage and google-api-python-client. They can be
installed using pip.
Registry team members should use either non-sudo pip3 or virtualenv/venv to
install the GCP libraries. A 'sudo pip install' may interfere with the Linux
tooling on your corp desktop. The non-sudo 'pip3 install' command installs the
libraries under $HOME/.local. The virtualenv or venv methods allow more control
over the installation location.
Below is an example of using virtualenv to install the libraries:
```shell
sudo apt-get install virtualenv python3-venv
python3 -m venv myproject
source myproject/bin/activate
pip install google-cloud-storage
pip install google-api-python-client
deactivate
```
If using virtualenv, make sure to run 'source myproject/bin/activate' before
running the rollback script.
## Usage
The tool can be invoked using the rollback_tool script in the Nomulus root
directory. The following parameters may be requested:
* dev_project: This is the GCP project that hosts the release and deployment
infrastructure, including the Spinnaker pipelines.
* project: This is the GCP project that hosts the Nomulus server to be rolled
back.
* env: This is the name of the Nomulus environment, e.g., sandbox or
production. Although the project to environment is available in Gradle
scripts and internal configuration files, it is not easy to extract them.
Therefore, we require the user to provide it for now.
A typical workflow goes as follows:
### Check Which Release is Serving
From the Nomulus root directory:
```shell
rollback_tool show_serving_release --dev_project ... --project ... --env ...
```
The output may look like:
```
backend nomulus-v049 nomulus-20201019-RC00
default nomulus-v049 nomulus-20201019-RC00
pubapi nomulus-v049 nomulus-20201019-RC00
tools nomulus-v049 nomulus-20201019-RC00
```
### Review Recent Deployments
```shell
rollback_tool show_recent_deployments --dev_project ... --project ... --env ...
```
This command displays up to 3 most recent deployments. The output (from sandbox
which only has two tracked deployments as of the writing of this document) may
look like:
```
backend nomulus-v048 nomulus-20201012-RC00
default nomulus-v048 nomulus-20201012-RC00
pubapi nomulus-v048 nomulus-20201012-RC00
tools nomulus-v048 nomulus-20201012-RC00
backend nomulus-v049 nomulus-20201019-RC00
default nomulus-v049 nomulus-20201019-RC00
pubapi nomulus-v049 nomulus-20201019-RC00
tools nomulus-v049 nomulus-20201019-RC00
```
### Roll to the Target Release
```shell
rollback_tool rollback --dev_project ... --project ... --env ... \
--targt_release {YOUR_CHOSEN_TAG} --run_mode ...
```
The rollback subcommand has two new parameters:
* target_release: This is the Nomulus tag of the target release, in the form
of nomulus-YYYYMMDD-RC[0-9][0-9]
* run_mode: This is the execution mode of the rollback action. There are three
modes:
1. dryrun: The tool will only output information about every step of the
rollback, including commands that a user can copy and run elsewhere.
2. interactive: The tool will prompt the user before executing each step.
The user may choose to abort the rollback, skip the step, or continue
with the step.
3. automatic: Tool will execute all steps in one shot.
The rollback steps are organized according to the following logic:
```
for service in ['backend', 'default', 'pubapi', 'tools']:
if service is on basicScaling: (See Notes # 1)
start the target version
if service is on manualScaling:
start the target version
set num_instances to its originally configured value
for service in ['backend', 'default', 'pubapi', 'tools']:
direct traffic to target version
for service in ['backend', 'default', 'pubapi', 'tools']:
if originally serving version is not the target version:
if originally serving version is on basicaScaling
stop the version
if originally serving version is on manualScaling:
stop the version
set_num_instances to 1 (See Notes #2)
```
Notes:
1. Versions on automatic scaling cannot be started or stopped by gcloud or the
AppEngine Admin REST API.
2. The minimum value assignable to num_instances through the REST API is 1.
This instance eventually will be released too.

View file

@ -0,0 +1,198 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Helper for using the AppEngine Admin REST API."""
import time
from typing import Any, Dict, FrozenSet, Set
from googleapiclient import discovery
from googleapiclient import http
import common
# AppEngine services under management.
SERVICES = frozenset(['backend', 'default', 'pubapi', 'tools'])
# Forces 'list' calls (for services and versions) to return all
# results in one shot, to avoid having to handle pagination. This values
# should be greater than the maximum allowed services and versions in any
# project (
# https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine#limits).
_PAGE_SIZE = 250
# Number of times to check the status of an operation before timing out.
_STATUS_CHECK_TIMES = 5
# Delay between status checks of a long-running operation, in seconds
_STATUS_CHECK_INTERVAL = 5
class PagingError(Exception):
"""Error for unexpected partial results.
List calls in this module do not handle pagination. This error is raised
when a partial result is received.
"""
def __init__(self, uri: str):
super().__init__(
self, f'Received paged response unexpectedly when calling {uri}. '
'Consider increasing _PAGE_SIZE.')
class AppEngineAdmin:
"""Wrapper around the AppEngine Admin REST API client.
This class provides wrapper methods around the REST API for service and
version queries and for migrating between versions.
"""
def __init__(self,
project: str,
service_lookup: discovery.Resource = None,
status_check_interval: int = _STATUS_CHECK_INTERVAL) -> None:
"""Initialize this instance for an AppEngine(GCP) project."""
self._project = project
if service_lookup is not None:
apps = service_lookup.apps()
else:
apps = discovery.build('appengine', 'v1beta').apps()
self._services = apps.services()
self._operations = apps.operations()
self._status_check_interval = status_check_interval
@property
def project(self):
return self._project
def _checked_request(self, request: http.HttpRequest) -> Dict[str, Any]:
"""Verifies that all results are returned for a request."""
response = request.execute()
if 'nextPageToken' in response:
raise PagingError(request.uri)
return response
def get_serving_versions(self) -> FrozenSet[common.VersionKey]:
"""Returns the serving versions of every Nomulus service.
For each service in appengine.SERVICES, gets the version(s) actually
serving traffic. Services with the 'SERVING' status but no allocated
traffic are not included. Services not included in appengine.SERVICES
are also ignored.
Returns: An immutable collection of the serving versions grouped by
service.
"""
response = self._checked_request(
self._services.list(appsId=self._project, pageSize=_PAGE_SIZE))
# Response format is specified at
# http://googleapis.github.io/google-api-python-client/docs/dyn/appengine_v1beta5.apps.services.html#list.
versions = []
for service in response.get('services', []):
if service['id'] in SERVICES:
# yapf: disable
versions_with_traffic = (
service.get('split', {}).get('allocations', {}).keys())
# yapf: enable
for version in versions_with_traffic:
versions.append(common.VersionKey(service['id'], version))
return frozenset(versions)
# yapf: disable # argument indent wrong
def get_version_configs(
self, versions: Set[common.VersionKey]
) -> FrozenSet[common.VersionConfig]:
# yapf: enable
"""Returns the configuration of requested versions.
For each version in the request, gets the rollback-related data from
its static configuration (found in appengine-web.xml).
Args:
versions: A set of the VersionKey objects, each containing the
versions being queried in that service.
Returns:
The version configurations in an immutable set.
"""
requested_services = {version.service_id for version in versions}
version_configs = []
# Sort the requested services for ease of testing. For now the mocked
# AppEngine admin in appengine_test can only respond in a fixed order.
for service_id in sorted(requested_services):
response = self._checked_request(self._services.versions().list(
appsId=self._project,
servicesId=service_id,
pageSize=_PAGE_SIZE))
# Format of version_list is defined at
# https://googleapis.github.io/google-api-python-client/docs/dyn/appengine_v1beta5.apps.services.versions.html#list.
for version in response.get('versions', []):
if common.VersionKey(service_id, version['id']) in versions:
scalings = [
s for s in list(common.AppEngineScaling)
if s.value in version
]
if len(scalings) != 1:
raise common.CannotRollbackError(
f'Expecting exactly one scaling, found {scalings}')
scaling = common.AppEngineScaling(list(scalings)[0])
if scaling == common.AppEngineScaling.MANUAL:
manual_instances = version.get(
scaling.value).get('instances')
else:
manual_instances = None
version_configs.append(
common.VersionConfig(service_id, version['id'],
scaling, manual_instances))
return frozenset(version_configs)
def set_manual_scaling_num_instance(self, service_id: str, version_id: str,
manual_instances: int) -> None:
"""Creates an request to change an AppEngine version's status."""
update_mask = 'manualScaling.instances'
body = {'manualScaling': {'instances': manual_instances}}
response = self._services.versions().patch(appsId=self._project,
servicesId=service_id,
versionsId=version_id,
updateMask=update_mask,
body=body).execute()
operation_id = response.get('name').split('operations/')[1]
for _ in range(_STATUS_CHECK_TIMES):
if self.query_operation_status(operation_id):
return
time.sleep(self._status_check_interval)
raise common.CannotRollbackError(
f'Operation {operation_id} timed out.')
def query_operation_status(self, operation_id):
response = self._operations.get(appsId=self._project,
operationsId=operation_id).execute()
if response.get('response') is not None:
return True
if response.get('error') is not None:
raise common.CannotRollbackError(response['error'])
assert not response.get('done'), 'Operation done but no results.'
return False

View file

@ -0,0 +1,133 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Unit tests for appengine."""
from typing import Any, Dict, List, Tuple, Union
import unittest
from unittest import mock
from unittest.mock import patch
import appengine
import common
def setup_appengine_admin() -> Tuple[object, object]:
"""Helper for setting up a mocked AppEngineAdmin instance.
Returns:
An AppEngineAdmin instance and a request with which API responses can
be mocked.
"""
# Assign mocked API response to mock_request.execute.
mock_request = mock.MagicMock()
mock_request.uri.return_value = 'myuri'
# Mocked resource shared by services, versions, and operations.
resource = mock.MagicMock()
resource.list.return_value = mock_request
resource.get.return_value = mock_request
resource.patch.return_value = mock_request
# Root resource of AppEngine API. Exact type unknown.
apps = mock.MagicMock()
apps.services.return_value = resource
resource.versions.return_value = resource
apps.operations.return_value = resource
service_lookup = mock.MagicMock()
service_lookup.apps.return_value = apps
appengine_admin = appengine.AppEngineAdmin('project', service_lookup, 1)
return (appengine_admin, mock_request)
class AppEngineTestCase(unittest.TestCase):
"""Unit tests for appengine."""
def setUp(self) -> None:
self._client, self._mock_request = setup_appengine_admin()
self.addCleanup(patch.stopall)
# yapf: disable
def _set_mocked_response(
self,
responses: Union[Dict[str, Any], List[Dict[str, Any]]]) -> None:
# yapf: enable
if isinstance(responses, list):
self._mock_request.execute.side_effect = responses
else:
self._mock_request.execute.return_value = responses
def test_checked_request_multipage_raises(self) -> None:
self._set_mocked_response({'nextPageToken': ''})
self.assertRaises(appengine.PagingError,
self._client.get_serving_versions)
def test_get_serving_versions(self) -> None:
self._set_mocked_response({
'services': [{
'split': {
'allocations': {
'my_version': 3.14,
}
},
'id': 'pubapi'
}, {
'split': {
'allocations': {
'another_version': 2.71,
}
},
'id': 'error_dashboard'
}]
})
self.assertEqual(
self._client.get_serving_versions(),
frozenset([common.VersionKey('pubapi', 'my_version')]))
def test_get_version_configs(self):
self._set_mocked_response({
'versions': [{
'basicScaling': {
'maxInstances': 10
},
'id': 'version'
}]
})
self.assertEqual(
self._client.get_version_configs(
frozenset([common.VersionKey('default', 'version')])),
frozenset([
common.VersionConfig('default', 'version',
common.AppEngineScaling.BASIC)
]))
def test_async_update(self):
self._set_mocked_response([
{
'name': 'project/operations/op_id',
'done': False
},
{
'name': 'project/operations/op_id',
'done': False
},
{
'name': 'project/operations/op_id',
'response': {},
'done': True
},
])
self._client.set_manual_scaling_num_instance('service', 'version', 1)
self.assertEqual(self._mock_request.execute.call_count, 3)
if __name__ == '__main__':
unittest.main()

111
release/rollback/common.py Normal file
View file

@ -0,0 +1,111 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Declares data types that describe AppEngine services and versions."""
import dataclasses
import enum
import pathlib
import re
from typing import Optional
class CannotRollbackError(Exception):
"""Indicates that rollback cannot be done by this tool.
This error is for situations where rollbacks are either not allowed or
cannot be planned. Example scenarios include:
- The target release is incompatible with the SQL schema.
- The target release has never been deployed to AppEngine.
- The target release is no longer available, e.g., has been manually
deleted by the operators.
- A state-changing call to AppEngine Admin API has failed.
User must manually fix such problems before trying again to roll back.
"""
pass
class AppEngineScaling(enum.Enum):
"""Types of scaling schemes supported in AppEngine.
The value of each name is the property name in the REST API requests and
responses.
"""
AUTOMATIC = 'automaticScaling'
BASIC = 'basicScaling'
MANUAL = 'manualScaling'
@dataclasses.dataclass(frozen=True)
class VersionKey:
"""Identifier of a deployed version on AppEngine.
AppEngine versions as deployable units are managed on per-service basis.
Each instance of this class uniquely identifies an AppEngine version.
This class implements the __eq__ method so that its equality property
applies to subclasses by default unless they override it.
"""
service_id: str
version_id: str
def __eq__(self, other):
return (isinstance(other, VersionKey)
and self.service_id == other.service_id
and self.version_id == other.version_id)
@dataclasses.dataclass(frozen=True, eq=False)
class VersionConfig(VersionKey):
"""Rollback-related static configuration of an AppEngine version.
Contains data found from the application-web.xml for this version.
Attributes:
scaling: The scaling scheme of this version. This value determines what
steps are needed for the rollback. If a version is on automatic
scaling, we only need to direct traffic to it or away from it. The
version cannot be started, stopped, or have its number of instances
updated. If a version is on manual scaling, it not only needs to be
started or stopped explicitly, its instances need to be updated too
(to 1, the lowest allowed number) when it is shutdown, and to its
originally configured number of VM instances when brought up.
manual_scaling_instances: The originally configured VM instances to use
for each version that is on manual scaling.
"""
scaling: AppEngineScaling
manual_scaling_instances: Optional[int] = None
def get_nomulus_root() -> str:
"""Finds the current Nomulus root directory.
Returns:
The absolute path to the Nomulus root directory.
"""
for folder in pathlib.Path(__file__).parents:
if folder.name != 'nomulus':
continue
if not folder.joinpath('settings.gradle').exists():
continue
with open(folder.joinpath('settings.gradle'), 'r') as file:
for line in file:
if re.match(r"^rootProject.name\s*=\s*'nomulus'\s*$", line):
return folder.absolute()
raise RuntimeError(
'Do not move this file out of the Nomulus directory tree.')

148
release/rollback/gcs.py Normal file
View file

@ -0,0 +1,148 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Helper for managing Nomulus deployment records on GCS."""
from typing import Dict, FrozenSet, Set
from google.cloud import storage
import common
def _get_version_map_name(env: str):
return f'nomulus.{env}.versions'
def _get_schema_tag_file(env: str):
return f'sql.{env}.tag'
class GcsClient:
"""Manages Nomulus deployment records on GCS."""
def __init__(self, project: str, gcs_client=None) -> None:
"""Initializes the instance for a GCP project.
Args:
project: The GCP project with Nomulus deployment records.
gcs_client: Optional API client to use.
"""
self._project = project
if gcs_client is not None:
self._client = gcs_client
else:
self._client = storage.Client(self._project)
@property
def project(self):
return self._project
def _get_deploy_bucket_name(self):
return f'{self._project}-deployed-tags'
def _get_release_to_version_mapping(
self, env: str) -> Dict[common.VersionKey, str]:
"""Returns the content of the release to version mapping file.
File content is returned in utf-8 encoding. Each line in the file is
in this format:
'{RELEASE_TAG},{APP_ENGINE_SERVICE_ID},{APP_ENGINE_VERSION}'.
"""
file_content = self._client.get_bucket(
self._get_deploy_bucket_name()).get_blob(
_get_version_map_name(env)).download_as_text()
mapping = {}
for line in file_content.splitlines(False):
tag, service_id, version_id = line.split(',')
mapping[common.VersionKey(service_id, version_id)] = tag
return mapping
def get_versions_by_release(self, env: str,
nom_tag: str) -> FrozenSet[common.VersionKey]:
"""Returns AppEngine version ids of a given Nomulus release tag.
Fetches the version mapping file maintained by the deployment process
and parses its content into a collection of VersionKey instances.
A release may map to multiple versions in a service if it has been
deployed multiple times. This is not intended behavior and may only
happen by mistake.
Args:
env: The environment of the deployed release, e.g., sandbox.
nom_tag: The Nomulus release tag.
Returns:
An immutable set of VersionKey instances.
"""
mapping = self._get_release_to_version_mapping(env)
return frozenset(
[version for version in mapping if mapping[version] == nom_tag])
def get_releases_by_versions(
self, env: str,
versions: Set[common.VersionKey]) -> Dict[common.VersionKey, str]:
"""Gets the release tags of the AppEngine versions.
Args:
env: The environment of the deployed release, e.g., sandbox.
versions: The AppEngine versions.
Returns:
A mapping of versions to release tags.
"""
mapping = self._get_release_to_version_mapping(env)
return {
version: tag
for version, tag in mapping.items() if version in versions
}
def get_recent_deployments(
self, env: str, num_records: int) -> Dict[common.VersionKey, str]:
"""Gets the most recent deployment records.
Deployment records are stored in a file, with one line per service.
Caller should adjust num_records according to the number of services
in AppEngine.
Args:
env: The environment of the deployed release, e.g., sandbox.
num_records: the number of lines to go back.
"""
file_content = self._client.get_bucket(
self._get_deploy_bucket_name()).get_blob(
_get_version_map_name(env)).download_as_text()
mapping = {}
for line in file_content.splitlines(False)[-num_records:]:
tag, service_id, version_id = line.split(',')
mapping[common.VersionKey(service_id, version_id)] = tag
return mapping
def get_schema_tag(self, env: str) -> str:
"""Gets the release tag of the SQL schema in the given environment.
This tag is needed for the server/schema compatibility test.
"""
file_content = self._client.get_bucket(
self._get_deploy_bucket_name()).get_blob(
_get_schema_tag_file(env)).download_as_text().splitlines(False)
assert len(
file_content
) == 1, f'Unexpected content in {_get_schema_tag_file(env)}.'
return file_content[0]

View file

@ -0,0 +1,152 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Unit tests for gcs."""
import textwrap
import unittest
from unittest import mock
import common
import gcs
def setup_gcs_client(env: str):
"""Sets up a mocked GcsClient.
Args:
env: Name of the Nomulus environment.
Returns:
A GcsClient instance and two mocked blobs representing the two schema
tag file and version map file on GCS.
"""
schema_tag_blob = mock.MagicMock()
schema_tag_blob.download_as_text.return_value = 'tag\n'
version_map_blob = mock.MagicMock()
blobs_by_name = {
f'nomulus.{env}.versions': version_map_blob,
f'sql.{env}.tag': schema_tag_blob
}
bucket = mock.MagicMock()
bucket.get_blob.side_effect = lambda blob_name: blobs_by_name[blob_name]
google_client = mock.MagicMock()
google_client.get_bucket.return_value = bucket
gcs_client = gcs.GcsClient('project', google_client)
return (gcs_client, schema_tag_blob, version_map_blob)
class GcsTestCase(unittest.TestCase):
"""Unit tests for gcs."""
_ENV = 'crash'
def setUp(self) -> None:
self._client, self._schema_tag_blob, self._version_map_blob = \
setup_gcs_client(self._ENV)
self.addCleanup(mock.patch.stopall)
def test_get_schema_tag(self):
self.assertEqual(self._client.get_schema_tag(self._ENV), 'tag')
def test_get_versions_by_release(self):
self._version_map_blob.download_as_text.return_value = \
'nomulus-20200925-RC02,backend,nomulus-backend-v008'
self.assertEqual(
self._client.get_versions_by_release(self._ENV,
'nomulus-20200925-RC02'),
frozenset([common.VersionKey('backend', 'nomulus-backend-v008')]))
def test_get_versions_by_release_not_found(self):
self._version_map_blob.download_as_text.return_value = \
'nomulus-20200925-RC02,backend,nomulus-backend-v008'
self.assertEqual(
self._client.get_versions_by_release(self._ENV, 'no-such-tag'),
frozenset([]))
def test_get_versions_by_release_multiple_service(self):
self._version_map_blob.download_as_text.return_value = textwrap.dedent(
"""\
nomulus-20200925-RC02,backend,nomulus-backend-v008
nomulus-20200925-RC02,default,nomulus-default-v008
""")
self.assertEqual(
self._client.get_versions_by_release(self._ENV,
'nomulus-20200925-RC02'),
frozenset([
common.VersionKey('backend', 'nomulus-backend-v008'),
common.VersionKey('default', 'nomulus-default-v008')
]))
def test_get_versions_by_release_multiple_deployment(self):
self._version_map_blob.download_as_text.return_value = textwrap.dedent(
"""\
nomulus-20200925-RC02,backend,nomulus-backend-v008
nomulus-20200925-RC02,backend,nomulus-backend-v018
""")
self.assertEqual(
self._client.get_versions_by_release(self._ENV,
'nomulus-20200925-RC02'),
frozenset([
common.VersionKey('backend', 'nomulus-backend-v008'),
common.VersionKey('backend', 'nomulus-backend-v018')
]))
def test_get_releases_by_versions(self):
self._version_map_blob.download_as_text.return_value = textwrap.dedent(
"""\
nomulus-20200925-RC02,backend,nomulus-backend-v008
nomulus-20200925-RC02,default,nomulus-default-v008
""")
self.assertEqual(
self._client.get_releases_by_versions(
self._ENV, {
common.VersionKey('backend', 'nomulus-backend-v008'),
common.VersionKey('default', 'nomulus-default-v008')
}), {
common.VersionKey('backend', 'nomulus-backend-v008'):
'nomulus-20200925-RC02',
common.VersionKey('default', 'nomulus-default-v008'):
'nomulus-20200925-RC02',
})
def test_get_recent_deployments(self):
file_content = textwrap.dedent("""\
nomulus-20200925-RC02,backend,nomulus-backend-v008
nomulus-20200925-RC02,default,nomulus-default-v008
""")
self._version_map_blob.download_as_text.return_value = file_content
self.assertEqual(
self._client.get_recent_deployments(self._ENV, 2), {
common.VersionKey('default', 'nomulus-default-v008'):
'nomulus-20200925-RC02',
common.VersionKey('backend', 'nomulus-backend-v008'):
'nomulus-20200925-RC02'
})
def test_get_recent_deployments_fewer_lines(self):
self._version_map_blob.download_as_text.return_value = textwrap.dedent(
"""\
nomulus-20200925-RC02,backend,nomulus-backend-v008
nomulus-20200925-RC02,default,nomulus-default-v008
""")
self.assertEqual(
self._client.get_recent_deployments(self._ENV, 1), {
common.VersionKey('default', 'nomulus-default-v008'):
'nomulus-20200925-RC02'
})
if __name__ == '__main__':
unittest.main()

195
release/rollback/plan.py Normal file
View file

@ -0,0 +1,195 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Generates a sequence of operations for execution."""
from typing import FrozenSet, Tuple
import appengine
import common
import dataclasses
import gcs
import steps
@dataclasses.dataclass(frozen=True)
class ServiceRollback:
"""Data needed for rolling back one service.
Holds the configurations of both the currently serving version(s) and the
rollback target in a service.
Attributes:
target_version: The version to roll back to.
serving_versions: The currently serving versions to be stopped. This
set may be empty. It may also have multiple versions (when traffic
is split).
"""
target_version: common.VersionConfig
serving_versions: FrozenSet[common.VersionConfig]
def __post_init__(self):
"""Validates that all versions are for the same service."""
if self.serving_versions:
for config in self.serving_versions:
assert config.service_id == self.target_version.service_id
# yapf: disable
def _get_service_rollback_plan(
target_configs: FrozenSet[common.VersionConfig],
serving_configs: FrozenSet[common.VersionConfig]
) -> Tuple[ServiceRollback]:
# yapf: enable
"""Determines the versions to bring up/down in each service.
In each service, this method makes sure that at least one version is found
for the rollback target. If multiple versions are found, which may only
happen if the target release was deployed multiple times, randomly choose
one.
If a target version is already serving traffic, instead of checking if it
gets 100 percent of traffic, this method still generates operations to
start it and direct all traffic to it. This is not a problem since these
operations are idempotent.
Attributes:
target_configs: The rollback target versions in each managed service
(as defined in appengine.SERVICES).
serving_configs: The currently serving versions in each service.
Raises:
CannotRollbackError: Rollback is impossible because a target version
cannot be found for some service.
Returns:
For each service, the versions to bring up/down if applicable.
"""
targets_by_service = {}
for version in target_configs:
targets_by_service.setdefault(version.service_id, set()).add(version)
serving_by_service = {}
for version in serving_configs:
serving_by_service.setdefault(version.service_id, set()).add(version)
# The target_configs parameter only has configs for managed services.
# Since targets_by_service is derived from it, its keyset() should equal
# to appengine.SERVICES.
if targets_by_service.keys() != appengine.SERVICES:
cannot_rollback = appengine.SERVICES.difference(
targets_by_service.keys())
raise common.CannotRollbackError(
f'Target version(s) not found for {cannot_rollback}')
plan = []
for service_id, versions in targets_by_service.items():
serving_configs = serving_by_service.get(service_id, set())
versions_to_stop = serving_configs.difference(versions)
chosen_target = list(versions)[0]
plan.append(ServiceRollback(chosen_target,
frozenset(versions_to_stop)))
return tuple(plan)
# yapf: disable
def _generate_steps(
gcs_client: gcs.GcsClient,
appengine_admin: appengine.AppEngineAdmin,
env: str,
target_release: str,
rollback_plan: Tuple[ServiceRollback]
) -> Tuple[steps.RollbackStep, ...]:
# yapf: enable
"""Generates the sequence of operations for execution.
A rollback consists of the following steps:
1. Run schema compatibility test for the target release.
2. For each service,
a. If the target version does not use automatic scaling, start it.
i. If target version uses manual scaling, sets its instances to the
configured values.
b. If the target version uses automatic scaling, do nothing.
3. For each service, immediately direct all traffic to the target version.
4. For each service, go over its versions to be stopped:
a. If a version uses automatic scaling, do nothing.
b. If a version does not use automatic scaling, stop it.
i. If a version uses manual scaling, sets its instances to 1 (one, the
lowest value allowed on the REST API) to release the instances.
5. Update the appropriate deployed tag file on GCS with the target release
tag.
Returns:
The sequence of operations to execute for rollback.
"""
rollback_steps = [
steps.check_schema_compatibility(gcs_client.project, target_release,
gcs_client.get_schema_tag(env))
]
for plan in rollback_plan:
if plan.target_version.scaling != common.AppEngineScaling.AUTOMATIC:
rollback_steps.append(
steps.start_or_stop_version(appengine_admin.project, 'start',
plan.target_version))
if plan.target_version.scaling == common.AppEngineScaling.MANUAL:
rollback_steps.append(
steps.set_manual_scaling_instances(
appengine_admin, plan.target_version,
plan.target_version.manual_scaling_instances))
for plan in rollback_plan:
rollback_steps.append(
steps.direct_service_traffic_to_version(appengine_admin.project,
plan.target_version))
for plan in rollback_plan:
for version in plan.serving_versions:
if plan.target_version.scaling != common.AppEngineScaling.AUTOMATIC:
rollback_steps.append(
steps.start_or_stop_version(appengine_admin.project,
'stop', version))
if plan.target_version.scaling == common.AppEngineScaling.MANUAL:
# Release all but one instances. Cannot set num_instances to 0
# with this api.
rollback_steps.append(
steps.set_manual_scaling_instances(appengine_admin,
version, 1))
rollback_steps.append(
steps.update_deploy_tags(gcs_client.project, env, target_release))
return tuple(rollback_steps)
def get_rollback_plan(gcs_client: gcs.GcsClient,
appengine_admin: appengine.AppEngineAdmin, env: str,
target_release: str) -> Tuple[steps.RollbackStep]:
"""Generates the sequence of rollback operations for execution."""
target_versions = gcs_client.get_versions_by_release(env, target_release)
serving_versions = appengine_admin.get_serving_versions()
all_version_configs = appengine_admin.get_version_configs(
target_versions.union(serving_versions))
target_configs = frozenset([
config for config in all_version_configs if config in target_versions
])
serving_configs = frozenset([
config for config in all_version_configs if config in serving_versions
])
rollback_plan = _get_service_rollback_plan(target_configs, serving_configs)
return _generate_steps(gcs_client, appengine_admin, env, target_release,
rollback_plan)

View file

@ -0,0 +1,129 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""End-to-end test of rollback."""
import textwrap
from typing import Any, Dict
import unittest
from unittest import mock
import appengine_test
import gcs_test
import plan
def _make_serving_version(service: str, version: str) -> Dict[str, Any]:
"""Creates description of one serving version in API response."""
return {
'split': {
'allocations': {
version: 1,
}
},
'id': service
}
def _make_version_config(version,
scaling: str,
instance_tag: str,
instances: int = 10) -> Dict[str, Any]:
"""Creates one version config as part of an API response."""
return {scaling: {instance_tag: instances}, 'id': version}
class RollbackTestCase(unittest.TestCase):
"""End-to-end test of rollback."""
def setUp(self) -> None:
self._appengine_admin, self._appengine_request = (
appengine_test.setup_appengine_admin())
self._gcs_client, self._schema_tag, self._version_map = (
gcs_test.setup_gcs_client('crash'))
self.addCleanup(mock.patch.stopall)
def test_rollback_success(self):
self._schema_tag.download_as_text.return_value = (
'nomulus-2010-1014-RC00')
self._version_map.download_as_text.return_value = textwrap.dedent("""\
nomulus-20201014-RC00,backend,nomulus-backend-v009
nomulus-20201014-RC00,default,nomulus-default-v009
nomulus-20201014-RC00,pubapi,nomulus-pubapi-v009
nomulus-20201014-RC00,tools,nomulus-tools-v009
nomulus-20201014-RC01,backend,nomulus-backend-v011
nomulus-20201014-RC01,default,nomulus-default-v010
nomulus-20201014-RC01,pubapi,nomulus-pubapi-v010
nomulus-20201014-RC01,tools,nomulus-tools-v010
""")
self._appengine_request.execute.side_effect = [
# Response to get_serving_versions:
{
'services': [
_make_serving_version('backend', 'nomulus-backend-v011'),
_make_serving_version('default', 'nomulus-default-v010'),
_make_serving_version('pubapi', 'nomulus-pubapi-v010'),
_make_serving_version('tools', 'nomulus-tools-v010')
]
},
# Responses to get_version_configs. AppEngineAdmin queries the
# services by alphabetical order to facilitate this test.
{
'versions': [
_make_version_config('nomulus-backend-v009',
'basicScaling', 'maxInstances'),
_make_version_config('nomulus-backend-v011',
'basicScaling', 'maxInstances')
]
},
{
'versions': [
_make_version_config('nomulus-default-v009',
'basicScaling', 'maxInstances'),
_make_version_config('nomulus-default-v010',
'basicScaling', 'maxInstances')
]
},
{
'versions': [
_make_version_config('nomulus-pubapi-v009',
'manualScaling', 'instances'),
_make_version_config('nomulus-pubapi-v010',
'manualScaling', 'instances')
]
},
{
'versions': [
_make_version_config('nomulus-tools-v009',
'automaticScaling',
'maxTotalInstances'),
_make_version_config('nomulus-tools-v010',
'automaticScaling',
'maxTotalInstances')
]
}
]
steps = plan.get_rollback_plan(self._gcs_client, self._appengine_admin,
'crash', 'nomulus-20201014-RC00')
self.assertEqual(len(steps), 14)
self.assertRegex(steps[0].info(),
'.*nom_build :integration:sqlIntegrationTest.*')
self.assertRegex(steps[1].info(), '.*gcloud app versions start.*')
self.assertRegex(steps[5].info(),
'.*gcloud app services set-traffic.*')
self.assertRegex(steps[9].info(), '.*gcloud app versions stop.*')
self.assertRegex(steps[13].info(),
'.*echo nomulus-20201014-RC00 | gsutil cat -.*')
if __name__ == '__main__':
unittest.main()

View file

@ -0,0 +1,178 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Script to rollback the Nomulus server on AppEngine."""
import argparse
import dataclasses
import sys
import textwrap
from typing import Any, Optional, Tuple
import appengine
import gcs
import plan
MAIN_HELP = 'Script to roll back the Nomulus server on AppEngine.'
ROLLBACK_HELP = 'Rolls back Nomulus to the target release.'
GET_SERVING_RELEASE_HELP = 'Shows the release tag(s) of the serving versions.'
GET_RECENT_DEPLOYMENTS_HELP = ('Shows recently deployed versions and their '
'release tags.')
ROLLBACK_MODE_HELP = textwrap.dedent("""\
The execution mode.
- dryrun: Prints descriptions of all steps.
- interactive: Prompts for confirmation before executing
each step.
- auto: Executes all steps in one go.
""")
@dataclasses.dataclass(frozen=True)
class Argument:
"""Describes a command line argument.
This class is for use with argparse.ArgumentParser. Except for the
'arg_names' attribute which specifies the argument name and/or flags, all
other attributes must match an accepted parameter in the parser's
add_argument() method.
"""
arg_names: Tuple[str, ...]
help: str
default: Optional[Any] = None
required: bool = True
choices: Optional[Tuple[str, ...]] = None
def get_arg_attrs(self):
return dict((k, v) for k, v in vars(self).items() if k != 'arg_names')
ARGUMENTS = (Argument(('--dev_project', '-d'),
'The GCP project with Nomulus deployment records.'),
Argument(('--project', '-p'),
'The GCP project where the Nomulus server is deployed.'),
Argument(('--env', '-e'),
'The name of the Nomulus server environment.',
choices=('production', 'sandbox', 'crash', 'alpha')))
ROLLBACK_ARGUMENTS = (Argument(('--target_release', '-t'),
'The release to be deployed.'),
Argument(('--run_mode', '-m'),
ROLLBACK_MODE_HELP,
required=False,
default='dryrun',
choices=('dryrun', 'interactive', 'auto')))
def rollback(dev_project: str, project: str, env: str, target_release: str,
run_mode: str) -> None:
"""Rolls back a Nomulus server to the target release.
Args:
dev_project: The GCP project with deployment records.
project: The GCP project of the Nomulus server.
env: The environment name of the Nomulus server.
target_release: The tag of the release to be brought up.
run_mode: How to handle the rollback steps: print-only (dryrun)
one step at a time with user confirmation (interactive),
or all steps in one shot (automatic).
"""
steps = plan.get_rollback_plan(gcs.GcsClient(dev_project),
appengine.AppEngineAdmin(project), env,
target_release)
print('Rollback steps:\n\n')
for step in steps:
print(f'{step.info()}\n')
if run_mode == 'dryrun':
continue
if run_mode == 'interactive':
confirmation = input(
'Do you wish to (c)ontinue, (s)kip, or (a)bort? ')
if confirmation == 'a':
return
if confirmation == 's':
continue
step.execute()
def show_serving_release(dev_project: str, project: str, env: str) -> None:
"""Shows the release tag(s) of the currently serving versions."""
serving_versions = appengine.AppEngineAdmin(project).get_serving_versions()
versions_to_tags = gcs.GcsClient(dev_project).get_releases_by_versions(
env, serving_versions)
print(f'{project}:')
for version, tag in versions_to_tags.items():
print(f'{version.service_id}\t{version.version_id}\t{tag}')
def show_recent_deployments(dev_project: str, project: str, env: str) -> None:
"""Show release and version of recent deployments."""
num_services = len(appengine.SERVICES)
num_records = 3 * num_services
print(f'{project}:')
for version, tag in gcs.GcsClient(dev_project).get_recent_deployments(
env, num_records).items():
print(f'{version.service_id}\t{version.version_id}\t{tag}')
def main() -> int:
parser = argparse.ArgumentParser(prog='nom_rollback',
description=MAIN_HELP)
subparsers = parser.add_subparsers(dest='command',
help='Supported commands')
rollback_parser = subparsers.add_parser(
'rollback',
help=ROLLBACK_HELP,
formatter_class=argparse.RawTextHelpFormatter)
for flag in ARGUMENTS:
rollback_parser.add_argument(*flag.arg_names, **flag.get_arg_attrs())
for flag in ROLLBACK_ARGUMENTS:
rollback_parser.add_argument(*flag.arg_names, **flag.get_arg_attrs())
show_serving_release_parser = subparsers.add_parser(
'show_serving_release', help=GET_SERVING_RELEASE_HELP)
for flag in ARGUMENTS:
show_serving_release_parser.add_argument(*flag.arg_names,
**flag.get_arg_attrs())
show_recent_deployments_parser = subparsers.add_parser(
'show_recent_deployments', help=GET_RECENT_DEPLOYMENTS_HELP)
for flag in ARGUMENTS:
show_recent_deployments_parser.add_argument(*flag.arg_names,
**flag.get_arg_attrs())
args = parser.parse_args()
command = args.command
args = {k: v for k, v in vars(args).items() if k != 'command'}
{
'rollback': rollback,
'show_recent_deployments': show_recent_deployments,
'show_serving_release': show_serving_release
}[command](**args)
return 0
if __name__ == '__main__':
try:
sys.exit(main())
except Exception as ex: # pylint: disable=broad-except
print(ex)
sys.exit(1)

152
release/rollback/steps.py Normal file
View file

@ -0,0 +1,152 @@
# Copyright 2020 The Nomulus Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Definition of rollback steps and factory methods to create them."""
import dataclasses
import subprocess
import textwrap
from typing import Tuple
import appengine
import common
@dataclasses.dataclass(frozen=True)
class RollbackStep:
"""One rollback step.
Most steps are implemented using commandline tools, e.g., gcloud and
gsutil, and execute their commands by forking a subprocess. Each step
also has a info method that returns its command with a description.
Two steps are handled differently. The _UpdateDeployTag step gets a piped
shell command, which needs to be handled differently. The
_SetManualScalingNumInstances step uses the AppEngine Admin API client in
this package to set the number of instances. The Nomulus set_num_instances
command is not working right now.
"""
description: str
command: Tuple[str, ...]
def info(self) -> str:
return f'# {self.description}\n' f'{" ".join(self.command)}'
def execute(self) -> None:
"""Executes the step.
Raises:
CannotRollbackError if command fails.
"""
if subprocess.call(self.command) != 0:
raise common.CannotRollbackError(f'Failed: {self.description}')
def check_schema_compatibility(dev_project: str, nom_tag: str,
sql_tag: str) -> RollbackStep:
return RollbackStep(description='Check compatibility with SQL schema.',
command=(f'{common.get_nomulus_root()}/nom_build',
':integration:sqlIntegrationTest',
f'--schema_version={sql_tag}',
f'--nomulus_version={nom_tag}',
'--publish_repo='
f'gcs://{dev_project}-deployed-tags/maven'))
@dataclasses.dataclass(frozen=True)
class _SetManualScalingNumInstances(RollbackStep):
"""Sets the number of instances for a manual scaling version.
The Nomulus set_num_instances command is currently broken. This step uses
the AppEngine REST API to update the version.
"""
appengine_admin: appengine.AppEngineAdmin
version: common.VersionKey
num_instance: int
def execute(self) -> None:
self.appengine_admin.set_manual_scaling_num_instance(
self.version.service_id, self.version.version_id,
self.num_instance)
def set_manual_scaling_instances(appengine_admin: appengine.AppEngineAdmin,
version: common.VersionConfig,
num_instances: int) -> RollbackStep:
cmd_description = textwrap.dedent("""\
Nomulus set_num_instances command is currently broken.
This script uses the AppEngine REST API to update the version.
To set this value without using this tool, you may use the REST API at
https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1beta/apps.services.versions/patch
""")
return _SetManualScalingNumInstances(
f'Set number of instance for manual-scaling version '
f'{version.version_id} in {version.service_id} to {num_instances}.',
(cmd_description, ''), appengine_admin, version, num_instances)
def start_or_stop_version(project: str, action: str,
version: common.VersionKey) -> RollbackStep:
"""Creates a rollback step that starts or stops an AppEngine version.
Args:
project: The GCP project of the AppEngine application.
action: Start or Stop.
version: The version being managed.
"""
return RollbackStep(
f'{action.title()} {version.version_id} in {version.service_id}',
('gcloud', 'app', 'versions', action, version.version_id, '--quiet',
'--service', version.service_id, '--project', project))
def direct_service_traffic_to_version(
project: str, version: common.VersionKey) -> RollbackStep:
return RollbackStep(
f'Direct all traffic to {version.version_id} in {version.service_id}.',
('gcloud', 'app', 'services', 'set-traffic', version.service_id,
'--quiet', f'--splits={version.version_id}=1', '--project', project))
@dataclasses.dataclass(frozen=True)
class _UpdateDeployTag(RollbackStep):
"""Updates the deployment tag on GCS."""
nom_tag: str
destination: str
def execute(self) -> None:
with subprocess.Popen(('gsutil', 'cp', '-', self.destination),
stdin=subprocess.PIPE) as p:
try:
p.communicate(self.nom_tag.encode('utf-8'))
if p.wait() != 0:
raise common.CannotRollbackError(
f'Failed: {self.description}')
except:
p.kill()
raise
def update_deploy_tags(dev_project: str, env: str,
nom_tag: str) -> RollbackStep:
destination = f'gs://{dev_project}-deployed-tags/nomulus.{env}.tag'
return _UpdateDeployTag(
f'Update Nomulus tag in {env}',
(f'echo {nom_tag} | gsutil cp - {destination}', ''), nom_tag,
destination)

5
rollback_tool Executable file
View file

@ -0,0 +1,5 @@
#!/bin/sh
# Wrapper for rollback_tool.py.
cd $(dirname $0)
python3 ./release/rollback/rollback_tool.py "$@"
exit $?