Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update logcollector telemetry with common properties #3242

Merged
merged 3 commits into from
Oct 11, 2024

Conversation

maddieford
Copy link
Contributor

Description

The log collector process first checks that cgroups are setup correctly, adding any telemetry events if there are unexpected results.

Previously, the common properties for the telemetry events were initialized in LogCollector init, but LogCollector was not initialized until after cgroup checks. As a result, if the process added any events during cgroup checks, the common properties were not initialized:

Message ContainerId RoleInstanceName TenantName
Unable to determine which cgroup version to use: [CGroupsException] Detected hybrid cgroup mode, but there are controllers available to be enabled in unified hierarchy: cpuset 00000000-0000-0000-0000-000000000000 RoleInstanceName_UNINITIALIZED TenantName_UNINITIALIZED

This PR updates the log collector process to initialize the common properties in telemetry events before any cgroup checks. It also adds a telemetry event for uncompressed file size before we expect the process to exit due to mem limit exceeded.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@@ -208,28 +209,30 @@ def collect_logs(self, is_full_mode):
else:
logger.info("Running log collector mode normal")

LogCollector.initialize_telemetry()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this was happening in LogCollector init.
We don't initialize the LogCollector until after cgroup checks, which resulted in all cgroup check events having initialized common properties (see PR description)

# Check the cgroups unit
log_collector_monitor = None
tracked_controllers = []
if CollectLogsHandler.is_enabled_monitor_cgroups_check():
try:
cgroup_api = get_cgroup_api()
except InvalidCgroupMountpointException as e:
log_cgroup_warning("The agent does not support cgroups if the default systemd mountpoint is not being used: {0}".format(ustr(e)), send_event=True)
log_cgroup_warning("The agent does not support cgroups if the default systemd mountpoint is not being used: {0}".format(ustr(e)), op=WALAEventOperation.LogCollection, send_event=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating these events to be marked as LogCollection operations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation, but odd to send logcollection events with a function named as "cgroup". Plus the local log will still have the cgroups mark ("[CGW"]).

Maybe just create a similar log_log_collection_warn function? or if you don't need the special mark on the local log just use event.warn?

@@ -76,7 +76,6 @@ def __init__(self, is_full_mode=False):
self._must_collect_files = self._expand_must_collect_files()
self._create_base_dirs()
self._set_logger()
self._initialize_telemetry()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now done in the azurelinuxagent.agent.Agent.collect_logs

_LOGGER.info("Uncompressed archive size is %s b", total_uncompressed_size)
msg = "Uncompressed archive size is {0} b".format(total_uncompressed_size)
_LOGGER.info(msg)
add_event(op=WALAEventOperation.LogCollection, message=msg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this event before the process enters the main loop in an effort to collect this info for runs that fail due to mem limit exceeded

@@ -211,9 +211,8 @@ def test_log_collector_parses_commands_in_manifest(self):
diskinfo,""".format(folder_to_list, file_to_collect)

with patch("azurelinuxagent.ga.logcollector.MANIFEST_NORMAL", manifest):
with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These unit tests mocked _initialize_telemetry because it was in LogCollector init. Now that it has been removed from init, the mock is not necessary

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.54%. Comparing base (3aebcdd) to head (68ba023).
Report is 320 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3242      +/-   ##
===========================================
+ Coverage    71.97%   72.54%   +0.56%     
===========================================
  Files          103      114      +11     
  Lines        15692    16900    +1208     
  Branches      2486     2247     -239     
===========================================
+ Hits         11295    12260     +965     
- Misses        3881     4105     +224     
- Partials       516      535      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maddieford maddieford merged commit d79966d into Azure:develop Oct 11, 2024
13 checks passed
@maddieford maddieford deleted the logcollector_telem branch October 11, 2024 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants