-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CGroup v2 on Supervised with manual addon restarts #5419
base: main
Are you sure you want to change the base?
Conversation
dfc53e7
to
b35728e
Compare
📝 WalkthroughWalkthroughThe changes involve enhancements to the error handling and state management within the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Addon
participant DockerAddon
participant ResolutionSystem
User->>Addon: Start Addon
Addon->>ResolutionSystem: Check device access
ResolutionSystem-->>Addon: Return access issue if missing
Addon->>Addon: Manage state and dismiss issues
Addon->>User: Addon started or error logged
User->>DockerAddon: Stop Docker Addon
DockerAddon->>ResolutionSystem: Check for issues
ResolutionSystem-->>DockerAddon: Return device access issue if present
DockerAddon->>DockerAddon: Dismiss issue
DockerAddon->>User: Addon stopped or error logged
User->>FixupAddonExecuteRestart: Restart Addon
FixupAddonExecuteRestart->>Addon: Attempt to stop
Addon-->>FixupAddonExecuteRestart: Return status
FixupAddonExecuteRestart->>Addon: Attempt to start
Addon-->>FixupAddonExecuteRestart: Return status
FixupAddonExecuteRestart->>User: Return restart status
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (15)
supervisor/resolution/fixups/addon_execute_restart.py (5)
1-1
: Enhance the module docstring with more contextThe current docstring is quite brief. Consider expanding it to include the CGroup v2 context and explain when this helper is used.
-"""Helpers to fix addon by restarting it.""" +"""Helpers to fix addon by restarting it. + +This module provides functionality to restart addons when device access needs to be +updated under CGroup v2. Unlike CGroup v1, device access cannot be added dynamically +to running containers in CGroup v2, necessitating a container restart. +"""
13-15
: Improve setup function docstringThe current docstring doesn't explain the function's purpose clearly.
- """Check setup function.""" + """Initialize and return a FixupAddonExecuteRestart instance. + + Args: + coresys: Core system instance for dependency injection + + Returns: + FixupBase: Configured fixup instance for addon restart operations + """
21-22
: Fix incorrect docstringThe docstring describes initialization but this method processes the fixup.
- """Initialize the fixup class.""" + """Process the addon restart fixup. + + Args: + reference: The addon slug to restart + + Raises: + ResolutionFixupError: If the addon cannot be stopped + """
42-60
: Add return type hints to propertiesThe property return types should be explicitly annotated for better code maintainability.
@property - def suggestion(self) -> SuggestionType: + def suggestion(self) -> SuggestionType | None: """Return a SuggestionType enum.""" return SuggestionType.EXECUTE_RESTART @property - def context(self) -> ContextType: + def context(self) -> ContextType | None: """Return a ContextType enum.""" return ContextType.ADDON @property - def issues(self) -> list[IssueType]: + def issues(self) -> list[IssueType] | None: """Return a IssueType enum list.""" return [IssueType.DEVICE_ACCESS_MISSING] @property def auto(self) -> bool: - """Return if a fixup can be apply as auto fix.""" + """Return if a fixup can be applied as auto fix.""" return False
18-20
: Consider adding CGroup version validationThe class implements CGroup v2 specific behavior but doesn't validate if CGroup v2 is actually in use.
Consider adding a check in
process_fixup
to verify the CGroup version and handle cases where CGroup v1 is still in use. This would make the code more robust against system configuration changes.tests/resolution/evaluation/test_evaluate_cgroup.py (1)
Line range hint
1-85
: Consider adding specific test cases for device access scenarios.While the current test coverage is good for CGroup version evaluation, consider adding test cases that specifically verify:
- The behavior when device access is requested in CGroup v2
- The system's response indicating the need for manual addon restart
Example test case structure:
async def test_cgroup_v2_device_access(coresys: CoreSys): """Test device access behavior with CGroup v2.""" cgroup_version = EvaluateCGroupVersion(coresys) coresys.docker.info.cgroup = CGROUP_V2_VERSION # Add assertions for device access scenariossupervisor/resolution/const.py (1)
83-83
: LGTM! Consider adding docstring documentationThe new issue type aligns well with the PR objectives for handling device access limitations in CGroup v2. Consider adding docstring documentation to the
IssueType
enum to describe when this issue is raised and how it relates to CGroup v2 limitations.tests/resolution/fixup/test_addon_execute_restart.py (4)
19-26
: Maintain consistency in addon reference values.The
DEVICE_ACCESS_MISSING_ISSUE
usesTEST_ADDON_SLUG
whileEXECUTE_RESTART_SUGGESTION
hardcodes "local_ssh". Consider usingTEST_ADDON_SLUG
consistently for both constants.EXECUTE_RESTART_SUGGESTION = Suggestion( - SuggestionType.EXECUTE_RESTART, ContextType.ADDON, reference="local_ssh" + SuggestionType.EXECUTE_RESTART, ContextType.ADDON, reference=TEST_ADDON_SLUG )
36-38
: Remove unused mock_stop function.This function is defined but never used in the test. The stopping behavior is handled by the
DockerInterface.stop
mock instead.
43-48
: Improve readability of context manager setup.Consider breaking down the mock setup for better readability:
- with ( - patch.object(DockerInterface, "stop") as stop, - patch.object(DockerAddon, "run") as run, - patch.object(Addon, "_wait_for_startup"), - patch.object(Addon, "write_options"), - ): + with patch.object(DockerInterface, "stop") as stop, \ + patch.object(DockerAddon, "run") as run, \ + patch.object(Addon, "_wait_for_startup"), \ + patch.object(Addon, "write_options"):
63-63
: Fix inconsistent variable naming.The variable name
addon_execute_start
should beaddon_execute_restart
to match the class nameFixupAddonExecuteRestart
. This occurs in multiple test functions.- addon_execute_start = FixupAddonExecuteRestart(coresys) + addon_execute_restart = FixupAddonExecuteRestart(coresys)Also applies to: 87-87, 108-108
supervisor/docker/addon.py (1)
835-844
: Add documentation for CGroup v2 handling.Consider adding a docstring or comment explaining the CGroup v2 specific handling and why a restart is required when device access is needed.
Add documentation like this:
+ # When running on CGroup v2 without OS support, device access cannot be + # modified dynamically. A container restart is required to apply changes. if ( self.sys_docker.info.cgroup == CGROUP_V2_VERSION and not self.sys_os.available ):supervisor/addons/addon.py (1)
164-168
: Consider enhancing the docstring.The implementation is correct, but the docstring could be more descriptive about when this issue is raised and how it relates to CGroup v2 limitations.
@property def device_access_missing_issue(self) -> Issue: - """Get issue used if device access is missing and can't be automatically added.""" + """Get issue used when device access is missing due to CGroup v2 limitations. + + This issue is raised when a device is hotplugged and cannot be automatically + added to a running container due to CGroup v2 restrictions, requiring a manual + addon restart to gain access to the device. + """ return self._device_access_missing_issuetests/docker/test_addon.py (2)
438-438
: Usepatch
to mock methods instead of direct assignmentDirectly assigning
coresys.hardware.disk.get_disk_free_space = lambda x: 5000
modifies the method globally, which could affect other tests and lead to unintended side effects.Consider using
unittest.mock.patch
to temporarily mock the method within the scope of the test to ensure isolation:with patch.object(coresys.hardware.disk, 'get_disk_free_space', return_value=5000): # rest of the test code
454-455
: Avoid using arbitrary sleep in asynchronous testsUsing
await asyncio.sleep(0.01)
to wait for event processing can make the test flaky, as it relies on timing that might vary across different environments.Consider using synchronization primitives like
asyncio.Event
to wait for the event to be processed:event_processed = asyncio.Event() def on_event(*args, **kwargs): event_processed.set() coresys.bus.async_listen_once(BusEvent.HARDWARE_NEW_DEVICE, on_event) coresys.bus.fire_event(BusEvent.HARDWARE_NEW_DEVICE, TEST_HW_DEVICE) await event_processed.wait()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
supervisor/addons/addon.py
(3 hunks)supervisor/docker/addon.py
(4 hunks)supervisor/resolution/const.py
(3 hunks)supervisor/resolution/evaluations/cgroup.py
(2 hunks)supervisor/resolution/fixups/addon_execute_restart.py
(1 hunks)tests/docker/test_addon.py
(3 hunks)tests/resolution/evaluation/test_evaluate_cgroup.py
(1 hunks)tests/resolution/fixup/test_addon_execute_restart.py
(1 hunks)
🔇 Additional comments (16)
supervisor/resolution/evaluations/cgroup.py (2)
20-20
: LGTM! Verify documentation updates
The change to support both CGroup versions aligns with the PR objectives to enable CGroup v2 support in Supervised systems. This is a significant change that enables the manual addon restart workflow for device access.
Let's check if documentation needs updates to reflect this change:
#!/bin/bash
# Description: Look for documentation that might need updates regarding CGroup support
# Search for documentation files
echo "Searching for documentation files..."
fd -e md -e rst . docs/
# Search for CGroup-related content in documentation
echo "Searching for CGroup-related content..."
rg -i "cgroup|device.*access" docs/
5-5
: Verify the relocated constants
The constants have been moved to a centralized location, which is good for maintainability.
Let's verify the constants exist in their new location:
✅ Verification successful
Constants are correctly defined and imported
The constants CGROUP_V1_VERSION
and CGROUP_V2_VERSION
are properly defined in supervisor/resolution/const.py
and are being used correctly in the import statement. The constants are also being used consistently across the codebase, including in tests and Docker-related functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify CGROUP constants are properly defined in const.py
rg "CGROUP_V[12]_VERSION.*=.*" supervisor/resolution/const.py
Length of output: 374
Script:
#!/bin/bash
# Let's first check if the const.py file exists and examine its content
fd "const.py" supervisor/resolution/
# If found, let's examine its content
fd "const.py" supervisor/resolution/ -x cat {}
# Let's also search for these constants in the entire codebase
# to ensure we're not missing any alternate locations
rg "CGROUP_V[12]_VERSION" -A 1
Length of output: 5520
supervisor/resolution/fixups/addon_execute_restart.py (1)
34-41
: Reconsider error handling strategy for addon start
The current implementation silently logs start failures. While the comment explains why, this approach might hide important issues from the user.
Let's verify if this matches the intended behavior:
Consider one of these alternatives:
- Raise a different error type for start failures
- Return a status indicating partial success
- Add a warning state to the addon
tests/resolution/evaluation/test_evaluate_cgroup.py (1)
29-29
: LGTM! Assertion change aligns with PR objectives.
The modified assertion correctly reflects that CGroup v2 should no longer be marked as unsupported in Supervised systems, which is consistent with the PR's goal to support CGroup v2 with manual addon restarts.
supervisor/resolution/const.py (2)
18-19
: LGTM! Well-defined constants for CGroup versions
The string values are appropriate for configuration and logging purposes.
119-119
: LGTM! Verify suggestion type usage
The new suggestion type aligns with the PR's requirement for manual addon restarts in CGroup v2 environments.
Let's verify how this suggestion type is used in the codebase:
✅ Verification successful
EXECUTE_RESTART suggestion type is properly implemented
The suggestion type is well integrated into the codebase:
- Implemented in
FixupAddonExecuteRestart
class with proper restart logic (stop and start) - Used in docker addon context to handle device access issues
- Comprehensive test coverage including error cases
- Non-automatic fixup (requires manual intervention)
- Correctly associated with
IssueType.DEVICE_ACCESS_MISSING
andContextType.ADDON
The implementation aligns with the PR's requirement for manual addon restarts in CGroup v2 environments.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check how the new suggestion type is implemented
# Expected: Should find references in resolution manager and addon-related files
# Search for direct references to the new suggestion type
rg "EXECUTE_RESTART" --type python
# Look for resolution manager implementations handling this suggestion
ast-grep --pattern 'def $_(self, suggestion: SuggestionType):
$$$
if suggestion == SuggestionType.EXECUTE_RESTART:
$$$
'
Length of output: 196
Script:
#!/bin/bash
# Let's try with correct ripgrep file type and more specific ast-grep pattern
# Search for direct references to the new suggestion type
rg "EXECUTE_RESTART" -t py
# Look for resolution manager implementations
ast-grep --pattern 'class $_ {
$$$
def process_suggestion(self, suggestion: $_):
$$$
EXECUTE_RESTART
$$$
}'
# Check imports of SuggestionType where EXECUTE_RESTART might be used
rg "from supervisor.resolution.const import.*SuggestionType" -t py
Length of output: 5797
Script:
#!/bin/bash
# Let's check the implementation of the restart fixup
# Check the addon restart fixup implementation
cat supervisor/resolution/fixups/addon_execute_restart.py
# Look for the resolution manager's suggestion processing
rg -A 10 "process_suggestion.*EXECUTE_RESTART" -t py
# Check the test implementation for restart suggestion
cat tests/resolution/fixup/test_addon_execute_restart.py
Length of output: 6283
tests/resolution/fixup/test_addon_execute_restart.py (2)
81-104
: Well-structured test for start failure scenario.
The test properly verifies that:
- The stop operation is still called
- Issues are cleaned up even on failure
- Appropriate error message is logged
106-120
: Good coverage of missing addon scenario.
The test effectively verifies the edge case of a missing addon, ensuring:
- No operations are attempted
- Issues and suggestions are cleaned up
- Clear error message is logged
supervisor/docker/addon.py (2)
13-13
: LGTM: Import changes are appropriate.
The new imports for evolve
and CGROUP_V2_VERSION
are correctly placed and necessary for the CGroup v2 support implementation.
Also applies to: 44-44
791-797
: LGTM: Clean handling of device access issues during container removal.
The code appropriately cleans up device access issues when the container is removed, preventing stale issues from persisting.
supervisor/addons/addon.py (2)
151-153
: LGTM! Well-structured issue tracking implementation.
The new _device_access_missing_issue
property follows the established pattern and correctly initializes the issue with appropriate type and context.
193-199
: LGTM! Proper cleanup of device access issues.
The implementation correctly dismisses device access issues when the addon stops, which is essential for the manual restart workflow with CGroup v2. This ensures that the issue is cleared when the addon is intentionally stopped or restarted.
tests/docker/test_addon.py (4)
392-413
: Constants for test devices are appropriate and well-defined
The definitions of TEST_DEV_PATH
, TEST_SYSFS_PATH
, and TEST_HW_DEVICE
provide clear and necessary fixtures for the tests, ensuring accurate simulation of device-related scenarios.
468-469
: Use patch
to mock methods instead of direct assignment
Similar to a previous comment, directly assigning coresys.hardware.disk.get_disk_free_space = lambda x: 5000
can affect other tests.
483-484
: Avoid using arbitrary sleep in asynchronous tests
As previously noted, using await asyncio.sleep(0.01)
may cause unreliable test results.
456-456
: Verify hardcoded device major and minor numbers
The assertion add_devices.assert_called_once_with(123, "c 0:0 rwm")
uses hardcoded major and minor numbers "0:0"
. This may not match the actual device numbers and could lead to test failures if the device numbers differ.
Run the following script to confirm the major and minor numbers of TEST_HW_DEVICE
:
Ensure that the major and minor numbers output by the script match "0:0". If they differ, update the assertion to use the correct numbers:
-add_devices.assert_called_once_with(123, "c 0:0 rwm")
+add_devices.assert_called_once_with(123, f"c {major}:{minor} rwm")
Proposed change
CGroup v2 does not support the ability to add device access on the fly to a container like we could with CGroup v1. It does not appear we are going to get that support and CGroup v1 support is completely dropped at this point so we will use a workaround on Supervised systems.
CGroup v2 will no longer be marked as unsupported on Supervised systems. If a device is hotplugged and we need to add access to a running addon we will instead create a repair asking the user to restart the addon. This will not be automatic, users will have to restart the addon manually in order for it to access the device.
Note that this only applies to Supervised systems with CGroup v2. HAOS uses a patched version of CGroup v2 with the ability to add device access on the fly so users on HAOS systems will never see this repair or have to restart addons for device access.
Type of change
Additional information
Checklist
ruff format supervisor tests
)If API endpoints or add-on configuration are added/changed:
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests