Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System failures with 12 concurrent Qs #42

Open
sharatisrani opened this issue Aug 29, 2023 · 4 comments
Open

System failures with 12 concurrent Qs #42

sharatisrani opened this issue Aug 29, 2023 · 4 comments
Assignees
Labels
critical important to fix

Comments

@sharatisrani
Copy link
Collaborator

sharatisrani commented Aug 29, 2023

These 12 Q's were run concurrently (8 MVP1, 4 MVP2), to see the scores that can back from the ARS. Many failures resulted. It is possible that subsystems failed, as there is much new code under O&O - such as appraiser, novelty calculations, etc. All were run on CI

Barth's Disease. "pk": "b122cf57-459d-4fc7-a907-f3223a81e067",
Familial Insomnia "pk": "b74c5679-9f5d-4a26-8236-bf26d745ffd6",
Mastocytosis "pk": "42e7bc51-05c6-4691-ab64-84d7c66adfc4",
Systemic sclerosis "pk": "808ef931-7047-4f43-b419-25a6b97b900b",
Ehlers-D "pk": "74e04721-bc59-473c-9749-bc23ae454f54",
Gauchers Disease "pk": "46d3dc1e-8ef4-4f44-a14d-e472fe026912",
Nemaline "pk": "ff7fd0d9-8a74-4743-963b-7f36d68fad4a",
T2D "pk": "44a6949f-73f8-4c45-9953-35a35c233098",

BRCA1 "pk": "8dea69f3-4277-447c-8175-4acfae549026",
PCSK9 "pk": "ce4f2375-e6f5-413f-90e8-a4769ddce08d",
MUC5B "pk": "b35b849e-d8d4-4182-b6ea-af8c615f9ef8",
SMARCE1 "pk": "fe5af7af-cea6-48f7-a668-44d5bc9fae8f",

Thank you @maximusunc for looking into this - pls reassign based on what you find.

@sharatisrani sharatisrani added the critical important to fix label Aug 30, 2023
@sharatisrani
Copy link
Collaborator Author

Making it critical based on discussion at UI checkin this morning.

@maximusunc
Copy link
Collaborator

@sharatisrani please try running these queries again. I believe the issue was that the Answer Appraiser was running out of memory and giving the ARS back 502 errors, and then the ARS was reporting a 422. We have bumped up the resource allocation of the Appraiser and I've made sure that it can handle 12 concurrent queries.

@sharatisrani sharatisrani added showstopper Something essential isn't working and removed critical important to fix labels Aug 30, 2023
@sharatisrani
Copy link
Collaborator Author

sharatisrani commented Aug 30, 2023

Tested again, on CI. There's obvious progress. But still 3 out of 12 gave 422 failures.

With this progress, I also saw some ARAs that were routinely returning 0 results, which I doubt they wanted to be the case. And other specifics that Error 429's from ARAX for MVP2. So I have opened several other issues. All are marked showstopper till the decider (Tyler perhaps) decides these are not show stopper. All will be discussed in the O&O WG Aug 31.

The 422 failures may be tied to ARA failures, above paras. Hard to know. Eg always occurred when ARAGORN and BTE both returned 0 results. Is that a coincidence, or what?

Here are the 12 pks. The error 422s were Sclerosis, Ehlers-D and Gauchers.

MVP1
Barth "pk": "3b13c351-2c99-48d0-98c3-719fa9b548c9",
FFI "pk": "2eda4d97-df7e-4f9e-810b-12506f78f284",
Mastocytosis "pk": "fef324ad-cc9d-40f3-b1c8-205a5e78efda",
Sclerosis "pk": "41b67f5a-2056-438b-bcbf-2e5b3104ffe8",
Ehlers-D "pk": "a893586e-ab8c-4b74-b72e-995f43f02d92",
Gaucher "pk": "0338d798-fc30-4ae5-9f4c-a5364ec6153e",
Nemaline "pk": "0539f51b-6f89-4efe-9850-9001cb73a911",
T2D "pk": "c12a2b3d-7646-47e6-a656-49842a043585"

MVP2
BRCA1 "pk": "2ce4178b-c494-4ba2-b877-d6bce33f9f7e"
PCSK9 "pk": "1a89b7d4-f672-4c33-946d-76395383797f",
MUC5B "pk": "682f8ee2-5098-40e8-ac70-79c31e6084ed",
SMARCE1 "pk": "3cf9ca65-0e76-4bd6-ab95-51f6cd8ac603",

Based on the UI checkin meeting this morning, I am converting this to ShowStopper. I think it can be rapidly resolved.

@maximusunc
Copy link
Collaborator

The 3 failures are coming from the node annotator service (I think it's managed by BTE?) that is returning something back other than JSON. @sharatisrani could you please open a ticket either in the Feedback repo or on BTE itself wrt this issue?

@sharatisrani sharatisrani added critical important to fix and removed showstopper Something essential isn't working labels Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical important to fix
Projects
None yet
Development

No branches or pull requests

2 participants