-
-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection stuck when sendmsg
call completes withEPIPE
on Darwin
.
#2011
Comments
cc @mxinden might be of interest. |
Many UDP sendmsg failures are transient (e.g. due to wifi dropping out, switching networks, or even spoofed ICMP messages), so we don't generally want to act on them too eagerly, but perhaps this case merits special treatment.
While technically feasible, this seems tricky from an ergonomics standpoint. Users should be able to get reasonable behavior by default without having to know about this. Would it make sense for us to try to create a new socket at the same address whenever we get an error like that? How do we square that with user-provided sockets that might have other attached state we don't know about? What happens to a TCP connection on iOS under the same circumstances?
I think this is likely to lead to people reimplementing the existing idle timeout mechanism, or inadvertently degrading it. We should try to handle this type of issue automatically.
This is an interesting point. Even for transient errors, it's not constructive to sit around waiting for a response that we could've inferred won't happen. We'll need to tweak quinn-proto's API to support this, though. Maybe a |
I believe every connection is in a broken state after app was in background and they are recreated. Here is how it looks from logs provided by system frameworks: See logs just after app was brought back to foreground (a lot of errors that previously created connection are closed)
|
Would be interesting, too, to understand what happens on Android in these kinds of scenarios. As for "should be able to get reasonable behavior by default" it seems like the current behavior works well outside of mobile, so maybe it we should have some kind of wrapper |
I expect to investigate Android next week and report findings here. So far our QA is reported that connection is also does not work after BG until long timeout expires (probably idle timeout). But I guess Android has problem due to a different technical reason. Speculatively I can guess that Android might have more tricker problem which is timers are not advanced in background (tokio-rs/tokio#3185, rust-lang/rust#71860), but as I've said I didn't have a chance to debug problem myself and will likely to do it next week. |
@mstyura who's we, by the way? Always good to understand our downstream consumer's use cases. |
Are they recreated automatically, or is it the application's responsibility? It's interesting to note that Apple explicitly instructs application developers to release resources upon entering the background. Is it typical for libraries to handle that internally? It might be most idiomatic for your application layer to manually tear down/rebuild networking resources from the top level. |
It's very much depends of type of connection. According to observed logs connections maintained under the hood by URLSession (apple's http request api) are failed and re-created automatically by the system framework providing http api. The manually created connections like NWConnection are getting event that they are on failed state after BG and it's app responsibility to recover them. Regarding releasing resources in background, I believe Apple recommendations are mostly about different story - their main concern is releasing memory when app is not visible to user, so more apps can be preserved in memory at same time allowing faster switch between them. It's common for app on iOS to have some network communications after it's already minimized and you cab handle that, but can not handle the point when OS stop executing any thread of the app. Whether or not library handles socket disruption automatically is very much dependent on library. E.g for http client library like system provided URLSession it's natural to expect it handles reconnection automatically. For more low level sockets when they are initiated by user of the library I expect some kind of event when socket become broken and it's library user responsibility to handle recovery. Even now I believe Quinn is extensible enough such that I can provide custom AsyncUdpSocket which monitor EPIPE and signal back to my code to handle recovery. So far I've managed to handle stuck connection by re-creating them on EPIPE, which sound like ok but not perfect. It feels like better recovery could be to try rebind_abstract to make quic try to resume existing connection, but I've not yet completed with implementation of this to see if it actually work like this. Even if quinn will never by default provide any kind of signal to recover broken udp socket I would expect by default it should have been failed faster in such case e.g. by emitting internal event on I/O error from |
Thanks, that's useful context. Because we're (presumably?) not in a good position to detect when the connection can be resumed, and rebuilding UDP sockets internally is tricky, I see two reasonable paths forwards:
|
In one of the attempts to find workaround I've tried to return Line 67 in 41850c8
|
Currently the assumption is that the connection driver should never fail. Revisiting that does seem like a reasonable implementation strategy for this new class of errors. It's also probably a bad thing that the I/O driver failing causes connections to get stuck indefinitely regardless. |
On Android the situation with sockets in background is occurred to be unrelated to timers. Whenever device screen is locked while app was in foreground the next attempt to call |
The new interface proposed in #2018 might help accelerate recovery, though it might also be much more pessimistic than the impact of the limited amount of loss that can occur before quinn is backpressured by the unwritable socket. |
Steps to reproduce
quinn
to establishquic
connection to remote server in context ofiOS
application via code like this (only UDP socket configuration provided):iOS
application and lock the phone screen while previously established quic connection was alive;iOS
application and try to use previous quic connection such thatquinn
send some packets.Actual result
quinn
is unable to send any UDP packet over previously constructed UDP socket. It receives "Broken Pipe" error fromsendmsg
; Non of the IP packets leave the device;sendmsg
is called and failed, prolonging automatic detection of socket broken;quinn
connection is closed by local side once idle timeout is reached.Expected result
quinn
connection is not usable anymore.Endpoint::rebind
once problem with socket detected.Connection
is exposed, like explicitping
method which can returnio::Error
from underlying UDP socket and user code can trigger rebind or completeEndpoint/Connection
.sendmsg
call failed.More details
I've tracked down
EPIPE
error fromsendmsg
insideXNU
kernel.The backtrace to likely origin from XNU source code:
It look like the socket state contains
SOF_DEFUNCT
orSS_CANTSENDMORE
so it's practically become unusable.According to
FreeBSD
documentationEPIPE
is also indicator of "you can not send more data via provided socket"See: https://man.freebsd.org/cgi/man.cgi?send(2)
UPD: The same "broken" state of socket produces
ENOTCONN
error when called withrecvmsg
, the origin of it seems to be this check in function soreceive.UPD 2: Seems
ENOTCONN
should also have some special treatment on Darwin https://github.com/libevent/libevent/pull/1031/filesThe text was updated successfully, but these errors were encountered: