-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chainweb-node sometimes freezes when receiving a lot of traffic #687
Comments
Yes this is happening for me within a few hours always. The node stops responding to port 443. It has open connections, but seems to die internally nc -z -v localhost 443 ^ this should reply with socket open indicating the socket server is dead, not responding to tcp at all. Restarting the node makes it work for a little while then freezes again within hours. ubuntu 18.04 using pre-built binaries 1.0.4. chainweb-node is only 126 open file descriptors, and the server has very large limits, so its not that. |
From our end - we've never seen this issue on our bootstrap nodes, including ones getting hit decently hard by mining activity. My theory is that this was due to low file descriptor settings on the nodes, but perhaps people have counter evidence? For those involved, please paste here the output of the following commands on your node machines:
|
ulimit -Sn500000 netstat -tapun | wc -l lsof at the time as I said was 126 |
Machine 1 ulimit -Sn |
Machine 2 ulimit -Sn |
|
ulimit -Sn
|
ulimit -Sn |
This is likely the problem... This thing is consuming TWENTY THOUSAND TCP PORTS (!!!!!!) ss -tnp | grep 443 | grep ESTAB | wc -l |
@moofone, I have observed large numbers of incoming connections from just a single (or a few) IP addresse(s), too. In those cases it seemed that the miner was creating too many connections. This line https://github.com/kadena-io/chainweb-miner/blob/033e0c7c27dc50f92ac98a91faeab079ebef2697/exec/Miner.hs#L362 in miner code looks suspicious. The miner is creating a new server event stream each time it receives an update from the server. The body of In any case, I think, the miner shouldn't open a new server stream on each update. |
I wonder if kadena-io/chainweb-miner#7 can cause the miner to leak connections to the update stream. |
I wonder if our nodes are being attacked. Most connections come from the same IP. |
After setting up an
Run this on your node server to catch your worst offenders
I suppose it could also just be large farms. |
What became of this? |
This wound up being a mining client in it's infancy misbehaving and imitating a kind of slow-loris attack. Not sure if anything was done to the chain web node to prevent it since then. |
This has plagued the network for a couple days now, people are running healthcheck scripts that curl local endpoints and restart chainweb-node if it stops responding.
From what I can tell the process is still alive, or systemd would restart it; it just seems to be hanging.
The text was updated successfully, but these errors were encountered: