When accepting a socket connection and ERROR_NETNAME_DELETED occurs, leads this into a closing of the serving socket (BaseProactorEventLoop) #93821

Cacsjep · 2022-06-14T18:30:39Z

Hey Guys

Application Description

We discover on our uvicorn server sometimes that the listening socket is closing.
The Uvicorn get image uploads from cameras and sometimes or often releated to mobile network conditions the connection is lost.
And when a connection is lost on remote site, this error occurs sometimes.

I am not very deep into such low level python code so I tried it hard to find the root couse and this are my results.
I hope this results show's that there is sometings and we can talk about that or you can give me hints or help do resolve this issue.

thank you in advance

Bug/Error that occurs

The Error occurs on ov.getresult() (asyncio.windows_events.py:560)
OSError: [WinError 64] The specified network name is no longer available.

After that the server is unresponsive and the listening socket is closed. In that case we need to restart the service.

To me it looks like this happens when ov.getresult() is called and the remote host is disconnected.
When I just wrap this function in a try\except block the listening socket is not closing and uvicorn detects a connection lost.

Please let me know why just wrapping this into a try block resolve this issue,
because I don't know ;-)

Original

def finish_accept(trans, key, ov):
     ov.getresult() # Here happen the OSError
     # Use SO_UPDATE_ACCEPT_CONTEXT so getsockname() etc work.
     buf = struct.pack('@P', listener.fileno())
     conn.setsockopt(socket.SOL_SOCKET,
         _overlapped.SO_UPDATE_ACCEPT_CONTEXT, buf)
     conn.settimeout(listener.gettimeout())
     return conn, conn.getpeername()

Mod (No socket closings)

def finish_accept(trans, key, ov):
     try:
         ov.getresult()
     except OSError as Error:
         print("This helps", Error)
     # Use SO_UPDATE_ACCEPT_CONTEXT so getsockname() etc work.
     buf = struct.pack('@P', listener.fileno())
     conn.setsockopt(socket.SOL_SOCKET,
         _overlapped.SO_UPDATE_ACCEPT_CONTEXT, buf)
     conn.settimeout(listener.gettimeout())
     return conn, conn.getpeername()

Error without the try\catch block

Task exception was never retrieved
future: <Task finished name='Task-248' coro=<IocpProactor.accept.<locals>.accept_coro() done, defined at C:\Program Files\Python310\lib\asyncio\windows_events.py:570> exception=OSError(22, 'The specified network name is no longer available', None, 64, None) created at C:\Program Files\Python310\lib\asyncio\tasks.py:636>
source_traceback: Object created at (most recent call last):
  File "Y:\code_projects\pol\pol\server\test\brotle.py", line 31, in <module>
    asyncio.run(serve(Starlette(debug=True, routes=routes), config), debug=True)
  File "C:\Program Files\Python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 633, in run_until_complete
    self.run_forever()
  File "C:\Program Files\Python310\lib\asyncio\windows_events.py", line 321, in run_forever
    super().run_forever()
  File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 600, in run_forever
    self._run_once()
  File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 1888, in _run_once
    handle._run()
  File "C:\Program Files\Python310\lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\Program Files\Python310\lib\asyncio\proactor_events.py", line 838, in loop
    f = self._proactor.accept(sock)
  File "C:\Program Files\Python310\lib\asyncio\windows_events.py", line 580, in accept
    tasks.ensure_future(coro, loop=self._loop)
  File "C:\Program Files\Python310\lib\asyncio\tasks.py", line 615, in ensure_future
    return _ensure_future(coro_or_future, loop=loop)
  File "C:\Program Files\Python310\lib\asyncio\tasks.py", line 636, in _ensure_future
    return loop.create_task(coro_or_future)
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\asyncio\windows_events.py", line 573, in accept_coro
    await future
  File "C:\Program Files\Python310\lib\asyncio\windows_events.py", line 819, in _poll
    value = callback(transferred, key, ov)
  File "C:\Program Files\Python310\lib\asyncio\windows_events.py", line 561, in finish_accept
    ov.getresult()
OSError: [WinError 64] The specified network name is no longer available

Minimal Example to reproduce

Start Server
Start Client

Server (Minimal Reproducible Example)

from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route

async def test_case(request):
    return JSONResponse({'hello': 'world'})

routes = [
    Route("/", endpoint=test_case, methods=['POST'])
]

##### Uvicorn Test
import uvicorn
uvicorn.run(
    Starlette(debug=True, routes=routes),
    reload=False,
    log_level="trace",
    host="0.0.0.0",
    port=8088,
)


##### Hypercorn Test
#import asyncio
#from hypercorn.config import Config
#from hypercorn.asyncio import serve

#config = Config()
#config.bind = ["localhost:8088"]
#asyncio.run(serve(Starlette(debug=True, routes=routes), config), debug=True)

Client (Minimal Reproducible Example)

import threading
import requests
import time
import sys


def test():
    for i in range(100):
        requests.post("http://127.0.0.1:8088/")

if __name__ == "__main__":
    for i in range(100):
        thread = threading.Thread(target=test, args=(), daemon=True)
        thread.start()
    time.sleep(1)
    sys.exit()

Tested with:
Uvicorn [x]
Hypercorn [x]

Environment
Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32

Linked PRs

kumaraditya303 · 2022-06-15T12:03:21Z

Have you reported this to uvicorn/starlette bug tracker? Can you provide a reproducer with only asyncio and no third-party packages? Have you tested main branch?

Cacsjep · 2022-06-15T12:32:28Z

I will try to create a reproducer with only asncio, I just testet latest released version and not main branch.

When a OS error happen on the finish_accept function then BaseProactorEventLoop._start_serving.loop falls into that OSError except block and close the listener sock at line 864.

But its not the listening sock that produce this OS Error it is the connection socket, so an OS Error on connection sock leads into closing the listening sock.

Connection socket after this issue happen:
<socket.socket fd=848, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8088), raddr=(0, b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')> (0, b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

While i am trying to create a reproducer with asyncio could you tell me, is that a expected behaviour?

Thank u

Cacsjep · 2022-06-15T13:09:47Z

Tested with latest main branch and also can reproduce this error.
Here is a reproducer with asyncio, hope that helps.

Server

import asyncio


class Serve(asyncio.Protocol):
    def connection_made(self, transport):
        self.transport = transport

    def data_received(self, data):
        self.transport.write(b"OK")

async def main():
    loop = asyncio.get_running_loop()
    server = await loop.create_server(lambda: Serve(), '127.0.0.1', 8088)

    async with server:
        await server.serve_forever()

asyncio.run(main(), debug=True)

Client

import threading
import requests
import sys

def create():
    for i in range(100):
        requests.post("http://127.0.0.1:8088/", data="acs")

if __name__ == "__main__":
    for i in range(100):
        thread = threading.Thread(target=create, args=(), daemon=True)
        thread.start()
    sys.exit()

Cacsjep · 2022-06-16T17:29:09Z

Latest Findings

When AcceptEx encounter an ERROR_NETNAME_DELETED with WSAGetLastError() (overlapped.c),
this leads into an OSError when ov.getresult() is called at windows_events.py:576, this OSError is then catched up by BaseProactorEventLoop proactor_events.py:856 and leads into closing the listening sock.

The "real" underlying error that on my case occurs is a "WSAECONNRESET" while accepting the connection socket.

How can we deal with WSAECONNRESET or ERROR_NETNAME_DELETED errors when accpeting a socket without closing the listening socket, a easy fix is just closing accept socket and raise a ConnectionResetError that is cached from BaseProactorEventLoop serving loop.

--- a/Lib/asyncio/windows_events.py
+++ b/Lib/asyncio/windows_events.py
@@ -577,7 +577,13 @@ def accept(self, listener):
         ov.AcceptEx(listener.fileno(), conn.fileno())
 
         def finish_accept(trans, key, ov):
-            ov.getresult()
+            try:
+                ov.getresult()
+            except OSError as exc:
+                if exc.winerror in (_overlapped.ERROR_NETNAME_DELETED,
+                                    _overlapped.ERROR_OPERATION_ABORTED):
+                    conn.close()
+                    raise ConnectionResetError(*exc.args)
+                raise
             # Use SO_UPDATE_ACCEPT_CONTEXT so getsockname() etc work.
             buf = struct.pack('@P', listener.fileno())
             conn.setsockopt(socket.SOL_SOCKET,

index ddb9daca02..4b3d24c4e8 100644
--- a/Lib/asyncio/proactor_events.py
+++ b/Lib/asyncio/proactor_events.py
@@ -854,6 +854,8 @@ def loop(f=None):
                 if self.is_closed():
                     return
                 f = self._proactor.accept(sock)
+            except ConnectionResetError:
+                self.call_soon(loop)
             except OSError as exc:
                 if sock.fileno() != -1:
                     self.call_exception_handler({

kumaraditya303 · 2022-06-17T11:58:47Z

Can you post a diff rather than copying functions as it will easier to apply your changes? Also seems like #18199 is similar to this issue.

Cacsjep · 2022-06-17T14:18:27Z

Sure, I updated my latest comment. Yes looks similar.

gvanrossum · 2023-03-07T00:10:43Z

The PR linked above (gh-18199) is more appropriate for the issue in recv() and friends, gh-83037. The current issue is about accept(). A separate fix needs to be developed.

Assuming the best reaction to ERROR_NETNAME_DELETED is indeed retrying, there would be a number of options. We could special-case that error in proactor_events.py::BaseProactorEventLoop._start_serving.loop, or we could silently retry in windows_events.py::accept.finish_accept.

I have another niggling suspicion. There is a background Task involved here, created by

        tasks.ensure_future(coro, loop=self._loop)

in windows_events.py::accept. It is known that sometimes such tasks get garbage-collected before they've completed (e.g. #96323). But really, I don't think it is the cause of @iUnknwn's problem, based on the logs he sent me privately (those show that task as completed with exception=OSError(22, 'The specified network name is no longer available', None, 64, None)).

Anyway, if anyone knows how to reproduce getting ERROR_NETNAME_DELETED from an accept() call, please let us know!

gvanrossum · 2023-03-08T03:48:49Z

@kumaraditya303 Have you ever heard of ERROR_NETNAME_DELETED?

kumaraditya303 · 2023-03-08T04:02:33Z

I remember this, it is a very annoying error ERROR_NETNAME_DELETED and hard to reproduce, I don't remember much now and haven't used asyncio on Windows in a while.

gvanrossum · 2023-03-08T04:03:56Z

Could it be caused by IIS in reverse proxy mode?

kumaraditya303 · 2023-03-08T04:04:58Z

One thing I remember is that when I was getting this I looked at how other programming languages handle it and IIRC golang just retries in this case. This might have changed as it's been a while since then.

kumaraditya303 · 2023-03-08T04:17:42Z

Could it be caused by IIS in reverse proxy mode?

I don't know. Maybe @eryksun can shed some light on this issue?

eryksun · 2023-03-08T14:08:20Z

Winsock special cases several status codes returned by the AFD device that implements socket files. For example:

NTSTATUS	Winsock Error	Generic Windows Error
STATUS_LOCAL_DISCONNECT	WSAECONNABORTED	ERROR_NETNAME_DELETED
STATUS_CONNECTION_RESET	WSAECONNRESET	ERROR_NETNAME_DELETED
STATUS_REMOTE_DISCONNECT	WSAECONNRESET	ERROR_NETNAME_DELETED

If you're using asynchronous AcceptEx() and rely on the last error that's set by GetOverlappedResult(), then you'll get the generic Windows error ERROR_NETNAME_DELETED instead of the specialized Winsock error. The Overlapped type could support getting the last NT status that's set in overlapped.Internal. This could be used to disambiguate ConnectionAbortedError and ConnectionResetError in this case.

gvanrossum · 2023-03-09T17:20:54Z

Either way it means the connection was reset or disconnected, it's the same to asyncio. So it seems accept() should probably just retry without doing anything (there's no connection it can return). Perhaps in debug mode it could log something at the debug level. For recv() and friends we should discuss this in gh-83037.

gvanrossum · 2023-05-23T17:19:03Z

Looking for a volunteer to implement the proposed fix from my previous comment -- accept() should just retry instead of closing the socket, when ERROR_NETNAME_DELETED occurs. The code from gh-18199 can serve as an example.

fercod · 2023-07-05T10:10:55Z

Latest Findings

When AcceptEx encounter an ERROR_NETNAME_DELETED with WSAGetLastError() (overlapped.c), this leads into an OSError when ov.getresult() is called at windows_events.py:576, this OSError is then catched up by BaseProactorEventLoop proactor_events.py:856 and leads into closing the listening sock.

The "real" underlying error that on my case occurs is a "WSAECONNRESET" while accepting the connection socket.

How can we deal with WSAECONNRESET or ERROR_NETNAME_DELETED errors when accpeting a socket without closing the listening socket, a easy fix is just closing accept socket and raise a ConnectionResetError that is cached from BaseProactorEventLoop serving loop.

--- a/Lib/asyncio/windows_events.py
+++ b/Lib/asyncio/windows_events.py
@@ -577,7 +577,13 @@ def accept(self, listener):
         ov.AcceptEx(listener.fileno(), conn.fileno())
 
         def finish_accept(trans, key, ov):
-            ov.getresult()
+            try:
+                ov.getresult()
+            except OSError as exc:
+                if exc.winerror in (_overlapped.ERROR_NETNAME_DELETED,
+                                    _overlapped.ERROR_OPERATION_ABORTED):
+                    conn.close()
+                    raise ConnectionResetError(*exc.args)
+                raise
             # Use SO_UPDATE_ACCEPT_CONTEXT so getsockname() etc work.
             buf = struct.pack('@P', listener.fileno())
             conn.setsockopt(socket.SOL_SOCKET,

index ddb9daca02..4b3d24c4e8 100644
--- a/Lib/asyncio/proactor_events.py
+++ b/Lib/asyncio/proactor_events.py
@@ -854,6 +854,8 @@ def loop(f=None):
                 if self.is_closed():
                     return
                 f = self._proactor.accept(sock)
+            except ConnectionResetError:
+                self.call_soon(loop)
             except OSError as exc:
                 if sock.fileno() != -1:
                     self.call_exception_handler({

Thanks. This avoids my Uvicorn API (FastAPI) from crashing. Although, the following error continues to appear on the terminal:
OSError: [WinError 64] The specified network name is no longer available
ConnectionResetError: [WinError 64] The specified network name is no longer available

It happens to me when making multiple API requets in a short time to an endpoint which fetches data from another external API.

…ETED occurs, leads this into a closing of the serving socket This is done with the patches described by fercod in the GitHub thread python#93821. These patches were successfully tested by me and worked properly. The error propagates properly without closing the original socket and without causing the server to hang.

CoreyBird · 2024-09-27T07:15:04Z

I stumbled upon this solution after discovering it myself for the same issue mentioned. For me, this bug manifest itself by causing my server to hang up, requiring a hard restart to restore it's ability to serve again.

There are other ov.getresult() calls (one in accept, one in connect, and one in accept_pipe) - given we can get a client side closing the socket at any point I think it would make sense to catch the errors consistently for each state of the IocpProactor class.

+1 for solving the immediate issue. If we can get this in it would be a step in the right direction.

Cacsjep added the type-bug An unexpected behavior, bug, or error label Jun 14, 2022

AlexWaygood added the topic-asyncio label Jun 15, 2022

Cacsjep mentioned this issue Jun 17, 2022

gh-83037: Fix StreamWriter.wait_closed() ConnectionResetError for ProactorEventLoop #18199

Open

ezio-melotti added this to asyncio Jul 17, 2022

ezio-melotti moved this to Todo in asyncio Jul 17, 2022

iUnknwn mentioned this issue Feb 27, 2023

asyncio ProactorEventLoop: wait_closed() can raise ConnectionResetError #83037

Open

radkesvat mentioned this issue Oct 2, 2023

Several changes mostly on windows platform status-im/nim-chronos#451

Closed

xoriole mentioned this issue Mar 6, 2024

OSError: [WinError 64] The specified network name is no longer available Tribler/tribler#7759

Closed

StanHatko mentioned this issue Sep 13, 2024

gh-93821: Fix bug that accepting a socket connection and ERROR_NETNAME_DELETED occurs causes serving socket to be closed #124032

Open

bedevere-app bot mentioned this issue Sep 30, 2024

gh-93821: Handle connection resets on Windows #124779

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When accepting a socket connection and ERROR_NETNAME_DELETED occurs, leads this into a closing of the serving socket (BaseProactorEventLoop) #93821

When accepting a socket connection and ERROR_NETNAME_DELETED occurs, leads this into a closing of the serving socket (BaseProactorEventLoop) #93821

Cacsjep commented Jun 14, 2022 •

edited by bedevere-app bot

Loading

kumaraditya303 commented Jun 15, 2022

Cacsjep commented Jun 15, 2022

Cacsjep commented Jun 15, 2022 •

edited

Loading

Cacsjep commented Jun 16, 2022 •

edited

Loading

kumaraditya303 commented Jun 17, 2022

Cacsjep commented Jun 17, 2022

gvanrossum commented Mar 7, 2023

gvanrossum commented Mar 8, 2023

kumaraditya303 commented Mar 8, 2023

gvanrossum commented Mar 8, 2023

kumaraditya303 commented Mar 8, 2023

kumaraditya303 commented Mar 8, 2023

eryksun commented Mar 8, 2023

gvanrossum commented Mar 9, 2023

gvanrossum commented May 23, 2023 •

edited

Loading

fercod commented Jul 5, 2023 •

edited

Loading

CoreyBird commented Sep 27, 2024

When accepting a socket connection and ERROR_NETNAME_DELETED occurs, leads this into a closing of the serving socket (BaseProactorEventLoop) #93821

When accepting a socket connection and ERROR_NETNAME_DELETED occurs, leads this into a closing of the serving socket (BaseProactorEventLoop) #93821

Comments

Cacsjep commented Jun 14, 2022 • edited by bedevere-app bot Loading

Hey Guys

Application Description

Bug/Error that occurs

Linked PRs

kumaraditya303 commented Jun 15, 2022

Cacsjep commented Jun 15, 2022

Cacsjep commented Jun 15, 2022 • edited Loading

Cacsjep commented Jun 16, 2022 • edited Loading

kumaraditya303 commented Jun 17, 2022

Cacsjep commented Jun 17, 2022

gvanrossum commented Mar 7, 2023

gvanrossum commented Mar 8, 2023

kumaraditya303 commented Mar 8, 2023

gvanrossum commented Mar 8, 2023

kumaraditya303 commented Mar 8, 2023

kumaraditya303 commented Mar 8, 2023

eryksun commented Mar 8, 2023

gvanrossum commented Mar 9, 2023

gvanrossum commented May 23, 2023 • edited Loading

fercod commented Jul 5, 2023 • edited Loading

CoreyBird commented Sep 27, 2024

Cacsjep commented Jun 14, 2022 •

edited by bedevere-app bot

Loading

Cacsjep commented Jun 15, 2022 •

edited

Loading

Cacsjep commented Jun 16, 2022 •

edited

Loading

gvanrossum commented May 23, 2023 •

edited

Loading

fercod commented Jul 5, 2023 •

edited

Loading