chainweb-node sometimes freezes when receiving a lot of traffic #687

edmundnoble · 2019-11-11T02:24:39Z

This has plagued the network for a couple days now, people are running healthcheck scripts that curl local endpoints and restart chainweb-node if it stops responding.

From what I can tell the process is still alive, or systemd would restart it; it just seems to be hanging.

moofone · 2019-11-15T18:39:51Z

Yes this is happening for me within a few hours always. The node stops responding to port 443. It has open connections, but seems to die internally

nc -z -v localhost 443

^ this should reply with socket open indicating the socket server is dead, not responding to tcp at all. Restarting the node makes it work for a little while then freezes again within hours.

ubuntu 18.04 using pre-built binaries 1.0.4.

chainweb-node is only 126 open file descriptors, and the server has very large limits, so its not that.
Also plenty of disk space and free ram (18+GB available)

fosskers · 2019-11-15T19:45:49Z

From our end - we've never seen this issue on our bootstrap nodes, including ones getting hit decently hard by mining activity. My theory is that this was due to low file descriptor settings on the nodes, but perhaps people have counter evidence?

For those involved, please paste here the output of the following commands on your node machines:

ulimit -Sn
netstat -tapun | wc -l
lsof -p $(pgrep chainweb-node) | wc -l

moofone · 2019-11-15T20:21:33Z

ulimit -Sn

500000

netstat -tapun | wc -l
2151

lsof at the time as I said was 126

tylersisia · 2019-11-15T20:44:23Z

Machine 1

ulimit -Sn
1024
netstat -tapun | wc -l
117
lsof -p $(pgrep chainweb-node) | wc -l
217

tylersisia · 2019-11-15T20:45:05Z

Machine 2

ulimit -Sn
1024
netstat -tapun | wc -l
111
lsof -p $(pgrep chainweb-node) | wc -l
112

pkrasam · 2019-11-16T04:54:14Z

ulimit -Sn
1024000

netstat -tapun | wc -l
61

lsof -p $(pgrep chainweb-node) | wc -l
198

huglester · 2019-11-16T06:45:57Z

ulimit -Sn
1024
netstat -tapun | wc -l
282
lsof -p $(pgrep chainweb-node) | wc -l
416

I changed ulimit to 65535 - we will see

xdagx · 2019-11-16T16:07:48Z

ulimit -Sn
655350
netstat -tapun | wc -l
15121
lsof -p $(pgrep chainweb-node) | wc -l
15065

moofone · 2019-11-16T17:55:07Z

This is likely the problem... This thing is consuming TWENTY THOUSAND TCP PORTS (!!!!!!)

ss -tnp | grep 443 | grep ESTAB | wc -l
20077

larskuhtz · 2019-11-17T07:36:33Z

@moofone, I have observed large numbers of incoming connections from just a single (or a few) IP addresse(s), too. In those cases it seemed that the miner was creating too many connections.

This line https://github.com/kadena-io/chainweb-miner/blob/033e0c7c27dc50f92ac98a91faeab079ebef2697/exec/Miner.hs#L362 in miner code looks suspicious. The miner is creating a new server event stream each time it receives an update from the server. The body of withEvent doesn't seems to inspect the content of the stream, and I wonder if it leaves the connection in a dirty state so that it can't be reused, or if it may even leak the connection.

In any case, I think, the miner shouldn't open a new server stream on each update.

larskuhtz · 2019-11-17T07:52:12Z

I wonder if kadena-io/chainweb-miner#7 can cause the miner to leak connections to the update stream.

Jacoby6000 · 2019-11-19T22:03:07Z

> ss | grep ESTAB | grep 35000 | grep 84.42.23.120 | wc -l
813
> ss | grep ESTAB | grep 35000 |  wc -l
845

I wonder if our nodes are being attacked. Most connections come from the same IP.

Jacoby6000 · 2019-11-19T22:28:30Z

After setting up an ebtables rule, I don't have a problem for now. If it's an attacker, I'm sure they'll just switch IPs. need to make a rule that drops same IP with >n connections.

ebtables -A INPUT -p IPv4 --ip-src 84.42.23.120 -j DROP

Run this on your node server to catch your worst offenders

ss | grep ESTAB | grep ":$YOUR_NODE_PORT" | awk '{print $6}' | awk -F':' '{print $1}' | sort | uniq -c | sort

I suppose it could also just be large farms.

fosskers · 2019-11-19T23:10:18Z

kadena-io/chainweb-miner#9

chessai · 2023-06-13T22:50:33Z

What became of this?

Jacoby6000 · 2023-06-13T23:04:27Z

This wound up being a mining client in it's infancy misbehaving and imitating a kind of slow-loris attack. Not sure if anything was done to the chain web node to prevent it since then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chainweb-node sometimes freezes when receiving a lot of traffic #687

chainweb-node sometimes freezes when receiving a lot of traffic #687

edmundnoble commented Nov 11, 2019 •

edited by mercadoa

Loading

moofone commented Nov 15, 2019 •

edited

Loading

fosskers commented Nov 15, 2019 •

edited

Loading

moofone commented Nov 15, 2019

tylersisia commented Nov 15, 2019

tylersisia commented Nov 15, 2019

pkrasam commented Nov 16, 2019

huglester commented Nov 16, 2019 •

edited

Loading

xdagx commented Nov 16, 2019

moofone commented Nov 16, 2019 •

edited

Loading

larskuhtz commented Nov 17, 2019 •

edited

Loading

larskuhtz commented Nov 17, 2019

Jacoby6000 commented Nov 19, 2019 •

edited

Loading

Jacoby6000 commented Nov 19, 2019 •

edited

Loading

fosskers commented Nov 19, 2019

chessai commented Jun 13, 2023

Jacoby6000 commented Jun 13, 2023

chainweb-node sometimes freezes when receiving a lot of traffic #687

chainweb-node sometimes freezes when receiving a lot of traffic #687

Comments

edmundnoble commented Nov 11, 2019 • edited by mercadoa Loading

moofone commented Nov 15, 2019 • edited Loading

fosskers commented Nov 15, 2019 • edited Loading

moofone commented Nov 15, 2019

ulimit -Sn

tylersisia commented Nov 15, 2019

tylersisia commented Nov 15, 2019

pkrasam commented Nov 16, 2019

huglester commented Nov 16, 2019 • edited Loading

ulimit -Sn 1024 netstat -tapun | wc -l 282 lsof -p $(pgrep chainweb-node) | wc -l 416

xdagx commented Nov 16, 2019

moofone commented Nov 16, 2019 • edited Loading

larskuhtz commented Nov 17, 2019 • edited Loading

larskuhtz commented Nov 17, 2019

Jacoby6000 commented Nov 19, 2019 • edited Loading

Jacoby6000 commented Nov 19, 2019 • edited Loading

fosskers commented Nov 19, 2019

chessai commented Jun 13, 2023

Jacoby6000 commented Jun 13, 2023

edmundnoble commented Nov 11, 2019 •

edited by mercadoa

Loading

moofone commented Nov 15, 2019 •

edited

Loading

fosskers commented Nov 15, 2019 •

edited

Loading

huglester commented Nov 16, 2019 •

edited

Loading

ulimit -Sn
1024
netstat -tapun | wc -l
282
lsof -p $(pgrep chainweb-node) | wc -l
416

moofone commented Nov 16, 2019 •

edited

Loading

larskuhtz commented Nov 17, 2019 •

edited

Loading

Jacoby6000 commented Nov 19, 2019 •

edited

Loading

Jacoby6000 commented Nov 19, 2019 •

edited

Loading