-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database Issues during import (1.5GB JTL file) #305
Comments
Hi @kierangirvan Another thing to consider is the |
@ludeknovy my point above regarding the scheduler not clearing stale test runs can be ignored, I've noticed overnight the scheduler indeed cleared up those staled test runs. |
@ludeknovy but regarding the upload issue, we have historically managed to upload far larger files (4GB), so it doesn't appear to be a buffer/cache sizing issue. We will attempt to upload another test now that the stale test runs have been housekept (maybe clearing those stale runs might have helped?). |
@kierangirvan the scheduler has some period set after which it will clean the stale test reports up. |
but maybe you've adjusted the DB settings back then? Did you try to adjust those values for the current DB? |
anyways, here's a few more things to try:
|
The configuration of the DB has not changed in months. The issue looks to have started over the weekend when the DB was stopped (presumably AWS swapping us over to new hardware), by doing so we had 2 tasks (each running the DB) running for a short period, I suspect this is where the corruption has come about. We are running those queries now and will post the output to help diagnose the issue. |
To check for corrupted chunks, the query didn't work as exactly you had put it, but we removed the missing column and confirmed there is no corrupted chunks:
|
@kierangirvan this is out of my expertise, you need to google each of those errors and see it that takes you somewhere. |
Describe the bug
We have noticed that our most recent tests are currently stuck in an "in progress" state. This is visible by the UI yellow widget showing the number of processing test runs.
We have viewed the BE and DB logs and its clear the database is having issues. It seems to get so far through before it throws a segmentation fault, and tries to run a recovery. In the meantime, the BE shows that its aborted the upload,.
The backend logs show the following:
Notice the "Connection terminated unexpectedly".
Then viewing the DB logs:
Notice service process was terminated by signal 11, segmentation fault.
Presumably the other messages are because the DB was terminated.
It then seems to recover, and we are able to view historical test runs, we can upload smaller tests, but the 1.4GB jtl file is failing. We have the DB running on its own ECS task now with 2vCPU and 8GB
All other containers are running on another task with 2vCPU and 4GB
Neither of the containers looked exhausted resource wise.
The text was updated successfully, but these errors were encountered: