Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block extensions disallowed by policy #3259

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

mgunnala
Copy link

Description

Issue #


PR #2 for the policy engine allowlist feature:

  • invoke policy engine from exthandlers.py
  • block all extensions and report status if any errors are thrown during engine initialization
  • block any extensions that are disallowed by policy
  • add unit and e2e tests

PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

Copy link

codecov bot commented Nov 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.77%. Comparing base (3aebcdd) to head (a37508f).
Report is 328 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3259      +/-   ##
===========================================
+ Coverage    71.97%   72.77%   +0.79%     
===========================================
  Files          103      114      +11     
  Lines        15692    17081    +1389     
  Branches      2486     2277     -209     
===========================================
+ Hits         11295    12431    +1136     
- Misses        3881     4107     +226     
- Partials       516      543      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@maddieford maddieford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review not including tests

azurelinuxagent/ga/policy/policy_engine.py Outdated Show resolved Hide resolved
"""
# TODO: when CRP adds terminal error code for policy-related extension failures, set that as the default code.
def __init__(self, msg, inner=None, code=-1):
msg = "Extension is disallowed by agent policy and will not be processed: {0}".format(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case where agent failed to parse policy, I'm not sure we should say 'Extension is disallowed by policy'. In this case, extension is disallowed because there's some issue reading or parsing the policy.

I also am hesitant about 'agent policy' since policy is provided by customer

Copy link
Author

@mgunnala mgunnala Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change this to "Extension will not be processed: "

Parsing errors (InvalidPolicyError) would look like "Extension will not be processed: customer-provided policy file (path) is invalid, please correct the following error..."

Extension disallowed errors (ExtensionPolicyError) would look like "Extension will not be processed: failed to enable extension CustomScript because extension is not specified in policy allowlist. To enable, add extension to the allowed list in policy file (path)."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the error message as discussed above.

policy_op, policy_err_code = policy_err_map.get(ext_handler.state)
if policy_error is not None:
err = ExtensionPolicyError(msg="", inner=policy_error, code=policy_err_code)
self.__handle_and_report_ext_handler_errors(handler_i, err,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this create .status files for single config extensions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new function __handle_and_report_policy_error() - this should create a status file for any extension with settings.

ext_handler.name,
conf.get_policy_file_path())
err = ExtensionPolicyError(msg, code=policy_err_code)
self.__handle_and_report_ext_handler_errors(handler_i, err,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here about .status file for single config extensions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new function __handle_and_report_policy_error() - this should create a status file for any extension with settings.

Copy link
Contributor

@maddieford maddieford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. I'm going to spend tomorrow going through each e2e scenario and unit test, sorry for the slow review :/

ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),
# TODO: CRP does not currently have a terminal error code for uninstall. Once CRP adds
# an error code for uninstall or for policy, use this code instead of PluginDisableProcessingFailed
# Note that currently, CRP waits for 90 minutes to time out for a failed uninstall operation, instead of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some more detail to this comment?
Something like:

Note that currently, CRP will poll until the agent does not report a status for an extension that should be uninstalled. In the case of a policy error, the agent will report a failed status on behalf of the extension, which will cause CRP to poll for the full timeout period, instead of failing fast.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -692,6 +734,26 @@ def __handle_and_report_ext_handler_errors(ext_handler_i, error, report_op, mess
add_event(name=name, version=handler_version, op=report_op, is_success=False, log_event=True,
message=message)

@staticmethod
def __handle_and_report_policy_error(ext_handler_i, error, report_op, message, report=True, extension=None):
# TODO: Consider merging this function with __handle_and_report_ext_handler_errors() above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please leave some comment explaining why we broke this into a separate function? For policy related failures, we want to fail extensions fast. CRP will continue to poll for single-config ext status until timeout, so agent should write a status for single-config extensions. The other function does not create that status and we didn't want to touch the other function without investigating the impact of that change further

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


# Create status file for extensions with settings (single and multi config).
if extension is not None:
ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_status_file_if_not_exist() will not overwrite existing status file (for the current sequence number). Is this behavior acceptable?

ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),
# Note: currently, when uninstall is requested for an extension, CRP polls until the agent does not
# report status for that extension, or until timeout is reached. In the case of a policy error, the
# agent reports failed status on behalf of the extension, which will cause CRP to for the full timeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# agent reports failed status on behalf of the extension, which will cause CRP to for the full timeout,
# agent reports failed status on behalf of the extension, which will cause CRP to poll for the full timeout,

nit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@@ -3507,5 +3510,144 @@ def test_report_msg_if_handler_manifest_contains_invalid_values(self):
self.assertIn("'supportsMultipleExtensions' has a non-boolean value", kw_messages[2]['message'])


class TestExtensionPolicy(TestExtensionBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a test case for extension is allowed by policy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

name: "ExtensionPolicy"
tests:
- "ext_policy/ext_policy.py"
images: "random(endorsed)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run this on more distros so we can get better coverage before releasing the changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will running on all endorsed distros add too much overhead to the daily runs?

# Prepare extensions to test
unique = str(uuid.uuid4())
test_file = f"waagent-test.{unique}"
custom_script = ExtPolicy.TestCase(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment explaining why you chose each of these extensions (single-config, multi-config, etc), so that it is obvious to anyone who reviews this test case in the future

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

fail(f"The agent should have reported an error trying to {operation} {extension_case.extension.__str__()} "
f"because the extension is disallowed by policy.")
except Exception as error:
assert_that("Extension will not be processed" in str(error)) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also check for [ExtensionPolicyError] in the message to confirm the failure was due to policy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

azurelinuxagent/ga/exthandlers.py Show resolved Hide resolved
@@ -630,6 +630,70 @@ def test_it_should_handle_and_report_enable_errors_properly(self):
}
self._assert_extension_status(sc_handler, expected_extensions)

def test_it_should_handle_and_report_disallowed_extensions_properly(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also please add a case for multi config ext allowed by policy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

name: "ExtPolicyWithDependencies"
tests:
- "ext_policy/ext_policy_with_dependencies.py"
images: "random(endorsed)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here, we should get more coverage than 1 run per day, maybe consider running on all endorsed, or 5-10 endorsed images per day

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to all endorsed, but I can change to 5-10 if this adds too much overhead.

}
}
self._create_policy_file(policy)
self._operation_should_succeed("enable", custom_script)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add the test case where single-config ext should fail due to policy

Copy link
Author

@mgunnala mgunnala Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test case on line 189 sufficient, or did you mean something else?
(I will also add comments to make it clearer that this case is being tested)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants