Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal Instruction (core dumped) #20

Open
wjnicol opened this issue Nov 3, 2021 · 14 comments
Open

Illegal Instruction (core dumped) #20

wjnicol opened this issue Nov 3, 2021 · 14 comments

Comments

@wjnicol
Copy link

wjnicol commented Nov 3, 2021

Hello,

I installed IsoNet no problem and can run all the preparation steps fine either with GUI or command line.

When I try to start the refining step through the GUI nothing happens. When I try through the command line I get an "Illegal Instruction (core dumped)" error (picture attached)
Screenshot from 2021-11-03 13-10-16
. By googling the error it seems to be a cpu issue.

NVIDIA GeForce GTX 1080 running with NVIDIA drivers 470.63.01
Intel Xeon CPU E5-2687W 3.10GhZ x 16
Ubuntu 20.04
Python 3.8.10
cuDNN v8.2.4 for cuda 11.4
GCC 9.3.0
Cuda 11.4
tensorflow 2.4.0

Thank you for your help,

Best,

William J Nicolas

@procyontao
Copy link
Collaborator

Hi,

I do not have an exact solution to core dumped problem.
But could you make the versions match what was shown on the tensorflow website?
https://www.tensorflow.org/install/source#gpu

I recommend you to try some python virtual environment, such as anaconda.

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

Hello,

What exactly do you mean by creating a python virtual environment? Similar to how EMAN2 is installed?

I will investigate versions but I do not find a combination that fits my specs.

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

So I installed the most recent tensorflow instead, 2.6.0 and I have progress in the sense that I get a bunch of error messages:

11-05 14:34:20, INFO
######Isonet starts refining######

11-05 14:34:21, ERROR Traceback (most recent call last):
File "/home/wjnicol/Repo/IsoNet/bin/refine.py", line 25, in run
run_whole(args)
File "/home/wjnicol/Repo/IsoNet/bin/refine.py", line 106, in run_whole
from IsoNet.training.predict import predict
File "/home/wjnicol/Repo/IsoNet/training/predict.py", line 4, in
from tensorflow.keras.models import load_model
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/api/_v2/keras/init.py", line 10, in
from keras import version
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/init.py", line 25, in
from keras import models
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/models.py", line 20, in
from keras import metrics as metrics_module
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/metrics.py", line 26, in
from keras import activations
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/activations.py", line 20, in
from keras.layers import advanced_activations
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/layers/init.py", line 23, in
from keras.engine.input_layer import Input
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/engine/input_layer.py", line 21, in
from keras.engine import base_layer
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/engine/base_layer.py", line 43, in
from keras.mixed_precision import loss_scale_optimizer
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/mixed_precision/loss_scale_optimizer.py", line 18, in
from keras import optimizers
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/optimizers.py", line 26, in
from keras.optimizer_v2 import adadelta as adadelta_v2
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/optimizer_v2/adadelta.py", line 22, in
from keras.optimizer_v2 import optimizer_v2
File "/home/wjnicol/.local/lib/python3.8/site-packages/keras/optimizer_v2/optimizer_v2.py", line 36, in
keras_optimizers_gauge = tf.internal.monitoring.BoolGauge(
File "/home/wjnicol/.local/lib/python3.8/site-packages/tensorflow/python/eager/monitoring.py", line 360, in init
super(BoolGauge, self).init('BoolGauge', _bool_gauge_methods,
File "/home/wjnicol/.local/lib/python3.8/site-packages/tensorflow/python/eager/monitoring.py", line 135, in init
self._metric = self._metric_methods[self._label_length].create(*args)
tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

I think I did not install tensorflow properly. I followed the instructions you provided: pip install tensorflow-gpu==2.6.0 but when I read how to install tensorflow from the page you provide to check compatibility it involves many more steps. Should I do a proper installation of tensorflow or only the command you provide is enough?

Thanks,

@procyontao
Copy link
Collaborator

Hi,

I am sorry that you have to deal with these problems.
We do encountered a lot of problems when versions do not match what are shown on website.

What you can do is to either:
Download packages from https://developer.nvidia.com/cuda-toolkit https://developer.nvidia.com/cudnn and install.

Or use download anaconda: https://www.anaconda.com/

Here are commands for my recent installation:
conda create --name tf2.5
conda activate tf2.5
conda install python=3.6
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1
pip install tensorflow==2.5
pip install fire mrcfile tqdm scipy scikit-image
export HDF5_USE_FILE_LOCKING=FALSE
export PATH=/home/lytao/software/IsoNet/bin:$PATH
export PYTHONPATH=/home/lytao/software:$PYTHONPATH

Hope that would help.

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

Ok I am trying this right now. After this i should launch isonet.py gui from the tf2.5 environment?

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

Ok so this works (i didn't do the last two exports to the path because I had already done that prior. I did however need to do pip install PyQt5 after your commands. From there isonet.py gui works fine and refining works ! However it seems to be using all 16 cores at 100% and it just suddenly crashes my computer which then reboots. By crashing I mean sudden black screen and then it boots. It really weird. Tried it twice.

Uploading Screenshot from 2021-11-03 12-16-24.png…

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

Additional information: I am trying this on 3 bin4 tomograms, ~1k each...

@procyontao
Copy link
Collaborator

Thank you for your reporting this, there is a parameter that specify how many cpu you are going to use in preprocessing step.

@procyontao
Copy link
Collaborator

I suggest you start with tutorial dataset to observe the behavior of the program.

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

Even when i use 8 threads with the sample data or my data it does the same thing. The computer turns off
My CPU has 8 double threaded cores. Am I asking for too much even when I say 8 cpus? I will try with one.
Do you know of a log file in linux that reports various crashes and hardware issues. I'm wondering if your software is just too demanding for my computer.

@wjnicol
Copy link
Author

wjnicol commented Nov 5, 2021

I think It's making my system crash

wjnicol@caliban:~$ last -x | head | tac
wjnicol :1 :1 Fri Nov 5 15:55 - crash (00:07)
reboot system boot 5.11.0-37-generi Fri Nov 5 16:02 still running
runlevel (to lvl 5) 5.11.0-37-generi Fri Nov 5 16:03 - 16:25 (00:21)
wjnicol :1 :1 Fri Nov 5 16:03 - crash (00:21)
reboot system boot 5.11.0-37-generi Fri Nov 5 16:24 still running
runlevel (to lvl 5) 5.11.0-37-generi Fri Nov 5 16:25 - 16:46 (00:21)
wjnicol :1 :1 Fri Nov 5 16:25 - crash (00:20)
reboot system boot 5.11.0-37-generi Fri Nov 5 16:46 still running
runlevel (to lvl 5) 5.11.0-37-generi Fri Nov 5 16:46 still running
wjnicol :1 :1 Fri Nov 5 16:46 still logged in

@wjnicol
Copy link
Author

wjnicol commented Nov 6, 2021

Sorry for bombarding you with messages buti will be away from my workstation for 2 weeks and am trying to give you as much info as possible.

From this page, https://unix.stackexchange.com/questions/9819/how-to-find-out-from-the-logs-what-caused-system-shutdown , I found a way to get logs on why my comp shutsdown:

wjnicol@caliban:~$ grep -iv ': starting|kernel: .*: Power Button|watching system buttons|Stopped Cleaning Up|Started Crash recovery kernel' \

/var/log/messages /var/log/syslog /var/log/apcupsd*
| grep -iw 'recover[a-z]|power[a-z]|shut[a-z ]down|rsyslogd|ups'
grep: /var/log/messages: No such file or directory
/var/log/syslog:Nov 5 15:54:51 caliban apparmor.systemd[1012]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
/var/log/syslog:Nov 5 15:54:51 caliban systemd[1]: Finished Update UTMP about System Boot/Shutdown.
/var/log/syslog:Nov 5 15:54:51 caliban systemd[1]: Finished Restore /etc/resolv.conf if the system crashed before the ppp link was shut down.
/var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd. [v8.2001.0]
/var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: rsyslogd's groupid changed to 110
/var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: rsyslogd's userid changed to 104
/var/log/syslog:Nov 5 15:54:51 caliban rsyslogd: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1063" x-info="https://www.rsyslog.com"] start
/var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 0.585685] pci 0000:05:00.1: D0 power state depends on 0000:05:00.0
/var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 8.840032] EXT4-fs (nvme0n1): recovery complete
/var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 10.038314] EXT4-fs (sdc): recovery complete
/var/log/syslog:Nov 5 15:54:51 caliban kernel: [ 11.837374] EXT4-fs (sdb1): recovery complete
/var/log/syslog:Nov 5 15:54:51 caliban dbus-daemon[1043]: dbus[1043]: Unknown group "power" in message bus configuration file
/var/log/syslog:Nov 5 15:54:51 caliban thermald[1075]: Need Linux PowerCap sysfs
/var/log/syslog:Nov 5 15:54:51 caliban NetworkManager[1044]: [1636152891.6834] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-dns-resolved.conf, 20-connectivity-ubuntu.conf, no-mac-addr-change.conf) (run: 10-globally-managed-devices.conf) (etc: default-wifi-powersave-on.conf)
/var/log/syslog:Nov 5 15:54:51 caliban systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Nov 5 15:54:55 caliban systemd[1]: Started Daemon for power management.
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: () Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) Using input driver 'libinput' for 'Power Button'
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (
) Power Button: always reports core events
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device removed
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6)
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) config/udev: Adding input device Power Button (/dev/input/event0)
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: () Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) Using input driver 'libinput' for 'Power Button'
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (
) Power Button: always reports core events
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device removed
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 7)
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:54:56 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:55:07 caliban kernel: [ 27.621060] systemd-journald[411]: File /var/log/journal/6af7e9060f66425b8aafcb55c60d336b/user-2011.journal corrupted or uncleanly shut down, renaming and replacing.
/var/log/syslog:Nov 5 15:55:07 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event1 - Power Button: device removed
/var/log/syslog:Nov 5 15:55:07 caliban /usr/lib/gdm3/gdm-x-session[1382]: (II) event0 - Power Button: device removed
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: () Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) Using input driver 'libinput' for 'Power Button'
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (
) Power Button: always reports core events
grep: /var/log/apcupsd
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: is tagged by udev as: Keyboard
: No such file or directory
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: device removed
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6)
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event1 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) config/udev: Adding input device Power Button (/dev/input/event0)
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: () Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) Using input driver 'libinput' for 'Power Button'
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (
) Power Button: always reports core events
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: device removed
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 7)
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Nov 5 15:55:08 caliban /usr/lib/gdm3/gdm-x-session[1859]: (II) event0 - Power Button: device is a keyboard
/var/log/syslog:Nov 5 15:55:08 caliban systemd[1759]: gnome-session-pre.target: Requested dependency OnFailure=gnome-session-shutdown.target ignored (target units cannot fail).
/var/log/syslog:Nov 5 15:55:08 caliban systemd[1759]: gnome-session-initialized.target: Requested dependency OnFailure=gnome-session-shutdown.target ignored (target units cannot fail).
/var/log/syslog:Nov 5 15:55:08 caliban systemd[1759]: gnome-session-failed.target: Requested dependency OnFailure=gnome-session-shutdown.target ignored (target units cannot fail).
/var/log/syslog:Nov 5 15:55:10 caliban systemd[1759]: Started GNOME Power management handling.
/var/log/syslog:Nov 5 15:55:10 caliban systemd[1759]: Reached target GNOME Power management handling.

@procyontao
Copy link
Collaborator

At least for tutorial dataset, we often use 20 cpus and 4 gpus 1080Ti. No such error/crash was observed. I think you can test with a much smaller dataset, e.g. 20 subtomos.

I do not know how to interpret those logs. I will inform you when I get some idea.

If you can, please let me know your commands to run IsoNet. If you are using GUI, please click print command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants