Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement/zenko 4941 #2178

Open
wants to merge 11 commits into
base: development/2.6
Choose a base branch
from
3 changes: 3 additions & 0 deletions .github/scripts/end2end/configs/zenko.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ spec:
command-timeout: "60s"
pending-job-poll-after-age: "10s"
pending-job-poll-check-interval: "10s"
scuba:
logging:
logLevel: debug
ingress:
workloadPlaneClass: 'nginx'
controlPlaneClass: 'nginx'
Expand Down
3 changes: 1 addition & 2 deletions tests/ctst/common/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,7 @@ When('the user tries to perform the current S3 action on the bucket {int} times
}
await runActionAgainstBucket(this, this.getSaved<ActionPermissionsType>('currentAction').action);
if (this.getResult().err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to keep failing on error, except for "retriable" errors : which can be safey ignore as this step is indeed about making multiple attempts

// stop at any error, the error will be evaluated in a separated step
return;
this.logger.debug('Error during repeated action', { error: this.getResult().err });
}
await Utils.sleep(delay);
}
Expand Down
10 changes: 8 additions & 2 deletions tests/ctst/common/hooks.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ import { cleanS3Bucket } from './common';
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';

const { atMostOnePicklePerTag } = parallelCanAssignHelpers;
const noParallelRun = atMostOnePicklePerTag(['@AfterAll', '@PRA', '@ColdStorage']);
const noParallelRun = atMostOnePicklePerTag(['@AfterAll', '@PRA', '@ColdStorage', '@Utilization']);

setParallelCanAssign(noParallelRun);

Expand Down Expand Up @@ -54,7 +54,13 @@ After(async function (this: Zenko, results) {
);
});

After({ tags: '@Quotas' }, async function () {
After({ tags: '@Quotas' }, async function (this: Zenko, results) {
if (results.result?.status === 'FAILED') {
this.logger.warn('quota was not cleaned for test', {
bucket: this.getSaved<string>('bucketName'),
});
return;
}
francoisferrand marked this conversation as resolved.
Show resolved Hide resolved
await teardownQuotaScenarios(this as Zenko);
});

Expand Down
80 changes: 78 additions & 2 deletions tests/ctst/features/quotas/Quotas.feature
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Feature: Quota Management for APIs
@CronJob
@DataDeletion
@NonVersioned
@Utilization
Scenario Outline: Quotas are affected by deletion operations
Given an action "DeleteObject"
And a permission to perform the "PutObject" action
Expand All @@ -80,11 +81,11 @@ Feature: Quota Management for APIs
And a <userType> type
And an environment setup for the API
And an "existing" IAM Policy that "applies" with "ALLOW" effect for the current API
When I wait 3 seconds
When I wait 6 seconds
And I PUT an object with size <uploadSize>
Then the API should "fail" with "QuotaExceeded"
When i delete object "obj-1"
And I wait 3 seconds
And I wait 6 seconds
And I PUT an object with size <uploadSize>
Then the API should "succeed" with ""

Expand All @@ -96,3 +97,78 @@ Feature: Quota Management for APIs
| 100 | 200 | 0 | IAM_USER |
| 100 | 0 | 200 | IAM_USER |
| 100 | 200 | 200 | IAM_USER |

@2.6.0
@PreMerge
@Quotas
@CronJob
@DataDeletion
@NonVersioned
@Utilization
Scenario Outline: Quotas are affected by deletion operations between count items runs
Given an action "DeleteObject"
And a permission to perform the "PutObject" action
And a STORAGE_MANAGER type
And a bucket quota set to 1000 B
And an account quota set to 1000 B
And an upload size of 1000 B for the object "obj-1"
And a bucket quota set to <bucketQuota> B
And an account quota set to <accountQuota> B
And a <userType> type
And an environment setup for the API
And an "existing" IAM Policy that "applies" with "ALLOW" effect for the current API
When I wait 3 seconds
williamlardier marked this conversation as resolved.
Show resolved Hide resolved
And I PUT an object with size <uploadSize>
Then the API should "fail" with "QuotaExceeded"
When the "count-items" cronjobs completes without error
francoisferrand marked this conversation as resolved.
Show resolved Hide resolved
# Wait for inflights to be read by SCUBA
When I wait 3 seconds
# At this point if negative inflights are not supported, write should
# not be possible, as the previous inflights are now part of the current
# metrics.
And i delete object "obj-1"
# Wait for inflights to be read by SCUBA
And I wait 3 seconds
williamlardier marked this conversation as resolved.
Show resolved Hide resolved
And I PUT an object with size <uploadSize>
Then the API should "succeed" with ""
williamlardier marked this conversation as resolved.
Show resolved Hide resolved

Examples:
| uploadSize | bucketQuota | accountQuota | userType |
| 100 | 200 | 0 | ACCOUNT |

@2.6.0
@PreMerge
@Quotas
@CronJob
@DataDeletion
@NonVersioned
Scenario Outline: Negative inflights do not allow to bypass the quota
Given an action "DeleteObject"
And a permission to perform the "PutObject" action
And a STORAGE_MANAGER type
And a bucket quota set to <bucketQuota> B
And an account quota set to <accountQuota> B
And an upload size of <uploadSize> B for the object "obj-1"
And a <userType> type
And an environment setup for the API
And an "existing" IAM Policy that "applies" with "ALLOW" effect for the current API
When I wait 3 seconds
And I PUT an object with size <uploadSize>
Then the API should "fail" with "QuotaExceeded"
# The inflights are now part of the current metrics
When the "count-items" cronjobs completes without error
# Wait for inflights to be read by SCUBA
When I wait 3 seconds
# At this point if negative inflights are not supported, write should
# not be possible, as the previous inflights are now part of the current
# metrics.
# Delete 200 B, inflights are now -200
And i delete object "obj-1"
# Wait for inflights to be read by SCUBA
And I wait 6 seconds
And I PUT an object with size 300
Then the API should "fail" with "QuotaExceeded"

Examples:
| uploadSize | bucketQuota | accountQuota | userType |
| 200 | 200 | 0 | ACCOUNT |
14 changes: 14 additions & 0 deletions tests/ctst/steps/quotas/quotas.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import { Scality, Command, Utils, AWSCredentials, Constants, Identity, IdentityE
import { createJobAndWaitForCompletion } from '../utils/kubernetes';
import { createBucketWithConfiguration, putObject } from '../utils/utils';
import { hashStringAndKeepFirst20Characters } from 'common/utils';
import assert from 'assert';

export async function prepareQuotaScenarios(world: Zenko, scenarioConfiguration: ITestCaseHookParameter) {
/**
Expand Down Expand Up @@ -136,6 +137,16 @@ Given('a bucket quota set to {int} B', async function (this: Zenko, quota: numbe
result,
});

// Ensure the quota is set
const resultGet: Command = await Scality.getBucketQuota(
this.parameters,
this.getCommandParameters());
this.logger.debug('GetBucketQuota result', {
resultGet,
});

assert(resultGet.stdout.includes(`${quota}`));

if (result.err) {
throw new Error(result.err);
}
Expand All @@ -158,6 +169,9 @@ Given('an account quota set to {int} B', async function (this: Zenko, quota: num
result,
});

// Ensure the quota is set
assert(JSON.parse(result.stdout).quota === quota);

if (result.err) {
throw new Error(result.err);
}
Expand Down
69 changes: 53 additions & 16 deletions tests/ctst/steps/utils/kubernetes.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
import fs from 'fs';
import * as path from 'path';
import lockFile from 'proper-lockfile';
import { KubernetesHelper, Utils } from 'cli-testing';
import Zenko from 'world/Zenko';
import {
Expand Down Expand Up @@ -71,12 +74,39 @@ export function createKubeCustomObjectClient(world: Zenko): CustomObjectsApi {
return KubernetesHelper.customObject;
}

export async function createJobAndWaitForCompletion(world: Zenko, jobName: string, customMetadata?: string) {
export async function createJobAndWaitForCompletion(
world: Zenko,
jobName: string,
customMetadata?: string
) {
const batchClient = createKubeBatchClient(world);
const watchClient = createKubeWatchClient(world);

const lockFilePath = path.join('/tmp', `${jobName}.lock`);
let releaseLock: (() => Promise<void>) | false = false;

if (!fs.existsSync(lockFilePath)) {
fs.writeFileSync(lockFilePath, '');
}
Copy link
Contributor

@francoisferrand francoisferrand Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the lock file exists, it means another test is waiting for this job : in that case, should the function not simply wait (polling) until the lock is removed? (i.e. the "first" test has completed its wait, and the job has indeed completed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with polling like that is that two tests may detect that the file does not exist at the same time, hence the use of the lock file, that internally relies on directory creation (which is atomic) and last modified dates, to ensure the whole approach is safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of if (!fs.existsSync(lockFilePath)) { fs.writeFileSync(lockFilePath, '');, we can do fs.writeFileSync(path, data, { overwrite: false } : to atomically write the file without overwriting, so only one will ever succeed.
--> depending on the "outcome" of the creation, the code can decide if it should proceed with reading the cron/... or if it should simply wait because something else is already doing it.


try {
releaseLock = await lockFile.lock(lockFilePath, {
// Expect the jobs in the queue does not take more than 5 minutes to complete
stale: 10 * 60 * 1000,
// use a linear backoff strategy
retries: {
retries: 610,
factor: 1,
minTimeout: 1000,
maxTimeout: 1000,
},
});
world.logger.debug(`Acquired lock for job: ${jobName}`);

// Read the cron job and prepare the job spec
const cronJob = await batchClient.readNamespacedCronJob(jobName, 'default');
const cronJobSpec = cronJob.body.spec?.jobTemplate.spec;

const job = new V1Job();
const metadata = new V1ObjectMeta();
job.apiVersion = 'batch/v1';
Expand All @@ -87,50 +117,57 @@ export async function createJobAndWaitForCompletion(world: Zenko, jobName: strin
'cronjob.kubernetes.io/instantiate': 'ctst',
};
if (customMetadata) {
metadata.annotations = {
custom: customMetadata,
};
metadata.annotations.custom = customMetadata;
}
job.metadata = metadata;

// Create the job
const response = await batchClient.createNamespacedJob('default', job);
world.logger.debug('job created', {
job: response.body.metadata,
});
world.logger.debug('Job created', { job: response.body.metadata });

const expectedJobName = response.body.metadata?.name;

// Watch for job completion
await new Promise<void>((resolve, reject) => {
void watchClient.watch(
'/apis/batch/v1/namespaces/default/jobs',
{},
(type: string, apiObj, watchObj) => {
if (job.metadata?.name && expectedJobName &&
(watchObj.object?.metadata?.name as string)?.startsWith?.(expectedJobName)) {
if (
expectedJobName &&
(watchObj.object?.metadata?.name as string)?.startsWith?.(expectedJobName)
) {
if (watchObj.object?.status?.succeeded) {
world.logger.debug('job succeeded', {
job: job.metadata,
});
world.logger.debug('Job succeeded', { job: job.metadata });
resolve();
} else if (watchObj.object?.status?.failed) {
world.logger.debug('job failed', {
world.logger.debug('Job failed', {
job: job.metadata,
object: watchObj.object,
});
reject(new Error('job failed'));
reject(new Error('Job failed'));
}
}
}, reject);
},
reject
);
});
} catch (err: unknown) {
world.logger.error('error creating job', {
world.logger.error('Error creating or waiting for job completion', {
jobName,
err,
});
throw err;
} finally {
// Ensure the lock is released
if (releaseLock) {
await releaseLock();
world.logger.debug(`Released lock for job: ${jobName}`);
}
}
}


export async function waitForZenkoToStabilize(
world: Zenko, needsReconciliation = false, timeout = 15 * 60 * 1000, namespace = 'default') {
// ZKOP pulls the overlay configuration from Pensieve every 5 seconds
Expand Down
Loading
Loading