Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically detect deleted resources #1386

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

rafaelweingartner
Copy link
Contributor

@rafaelweingartner rafaelweingartner commented May 28, 2024

This patch is created on top of #1385, #1381, #1387, and #1388. Therefore, we need to merge them first. Then, the extra commits here will disappear.

While executing some Gnocchi optimizations (#1307), we noticed that some deleted/removed resources do not have the "ended_at" field with a datetime. This can cause slowness with time, as more and more "zombie" resources are left there, and this has a direct impact in the MySQL queries executed with the aggregates API.

This patch introduces a new parameter called metric_inactive_after, which defines for how long a metric can go without receiving new data points until we consider it as inactive. Then, when all metrics of a resource are in inactive state, we can mark/consider the resource as removed.

@rafaelweingartner rafaelweingartner force-pushed the mark-resource-as-deleted-when-they-stop-receiving-measures branch 2 times, most recently from 67d710e to 9d08cc0 Compare May 28, 2024 20:24
run-upgrade-tests.sh Outdated Show resolved Hide resolved
gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/chef.py Outdated Show resolved Hide resolved
@rafaelweingartner
Copy link
Contributor Author

Thanks @pedro-martins !

gnocchi/chef.py Outdated Show resolved Hide resolved
While executing some Gnocchi optimizations (gnocchixyz#1307), we noticed that some deleted/removed resources do not have the "ended_at" field with a datetime. This can cause slowness with time, as more and more "zombie" resources are left there, and this has a direct impact in the MySQL queries executed with the aggregates API.

This patch introduces a new parameter called `metric_inactive_after`, which defines for how long a metric can go without receiving new datapoints until we consider it as inactive. Then, when all metrics of a resource are in inactive state, we can mark/consider the resource as removed.
@rafaelweingartner rafaelweingartner force-pushed the mark-resource-as-deleted-when-they-stop-receiving-measures branch from 7764d2c to 6e76708 Compare July 3, 2024 15:16
@rafaelweingartner
Copy link
Contributor Author

@pedro-martins and @chungg I have rebased this PR and it is ready for your reviews

Copy link
Contributor

@pedro-martins pedro-martins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Rafael, nice feature, I just leave some comments here, but the code overall is good to me. Thanks for this patch.

gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/chef.py Show resolved Hide resolved
gnocchi/opts.py Outdated Show resolved Hide resolved
run-upgrade-tests.sh Show resolved Hide resolved
@rafaelweingartner
Copy link
Contributor Author

Thanks @pedro-martins for your review!

gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/indexer/sqlalchemy.py Outdated Show resolved Hide resolved
gnocchi/storage/__init__.py Outdated Show resolved Hide resolved
gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/chef.py Outdated Show resolved Hide resolved
gnocchi/chef.py Show resolved Hide resolved
gnocchi/storage/__init__.py Outdated Show resolved Hide resolved
gnocchi/opts.py Show resolved Hide resolved
rafaelweingartner and others added 6 commits July 9, 2024 08:35
Co-authored-by: gord chung <5091603+chungg@users.noreply.github.com>
Co-authored-by: gord chung <5091603+chungg@users.noreply.github.com>
Co-authored-by: gord chung <5091603+chungg@users.noreply.github.com>
@rafaelweingartner
Copy link
Contributor Author

@chungg thanks for your review! I have added the code changes you suggested, and there are some remarks open, which I would like to hear your response before closing them.

@rafaelweingartner
Copy link
Contributor Author

Hello @chungg is there something else that needs to be addressed here?

@rafaelweingartner
Copy link
Contributor Author

Hello guys,
Is there something missing here to move on and merge it?

This patch is the base for some further optimizations that we are doing and that we would like to propose upstream.

Copy link
Member

@chungg chungg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm ok with this although i think it'd be better if the update logic was in sql rather than python. something like following but in orm. could also be done later.

update resource r set ended_at = %s where not exists (select 1 from metric m where m.resource_id=a.id AND m.last_measure_timestamp >= %s) and r.ended_at is null;

if resource.ended_at is not None:
LOG.info("Resource [%s] was marked with a timestamp for the "
"'ended_at' field. However, it received a "
"measurement for metric [%s]. Therefore, we undelete "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"measurement for metric [%s]. Therefore, we undelete "
"measurement for metric [%s]. Therefore, restoring "

op.add_column(
"metric", sqlalchemy.Column(
"last_measure_timestamp", sqlalchemy.DateTime,
nullable=False, server_default=func.current_timestamp()))
Copy link
Member

@chungg chungg Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if it makes sense to make this nullable? it's possible that this is disabled at first and gets a default value and is enabled later and immediately all of the resources can potentially get set to ended (until next update).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually nevermind. i just realised last_measure_timestamp is updated regardless if auto end logic is enabled or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants