Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server NetworkManager does not accept new connections from new clients if ONLY/FIRST client in game closes browser (webgl) #3117

Open
kevinstriker opened this issue Nov 7, 2024 · 17 comments
Assignees
Labels
Investigating Issue is currently being investigated priority:high stat:imported Status - Issue is tracked internally at Unity type:bug Bug Report

Comments

@kevinstriker
Copy link

kevinstriker commented Nov 7, 2024

Description

if the last/only player in a Multiplay Server setup closes their browser to quit the game.
The server does not accept new clients joining the game anymore...

Reproduce Steps

  1. Open the project (Unity 2022.3.x i use 8)
  2. Create a Project on Unity Services with: Multiplay Hosting, Lobby, Relay, Player Authentication, Matchmaker
    In Unity create a dedicated server build (keep aware of the name, you need it in Build Configuration later). Create a Build in Unity Services -> Multiplayer Hosting -> Buiids. By uploading the files you build from unity. After create build configuration, use the name of the build and search for xxx.x86_64 as the Game server executeable.
    -nographics -port $$port$$ -queryport $$query_port$$ -logFile $$log_dir$$/Engine.log
    create a fleet with build config and build and a region.
  3. Create a Matchmaker queue with name "WebFPSFFA" and a default pool choose the fleet. build config etc. and region, choose region of the fleet.
  4. Create a WebGL build (upload to itch or anywhere with https needed most likely)
  5. Spin up a game by starting the WebGL game -> main menu -> Lobby Browser -> Create game
  6. Keep in mind in might take 3-4 mins time when a new server needs to be spinned up, you can prevent this by having a min server available of 1 in the fleet. However there are (small) costs to this. Don't forget to close the min servers available after to 0
  7. Leave the game by closing the browser. Reopen browser and re-open the game, try to join same game (by lobby join)
    This will cause an exception on the StartClient()
  8. Ensure there are NO other players in the game, otherwise the bug doesn't happen.

Actual Outcome

StartClient() does not succeeds and time’s out.

Expected Outcome

New players should still be able to join the server by .StartClient().

Environment

  • OS: MacOS -> Build to WebGL and Dedicated Server Linux
  • Unity Version: 2023.3.8
  • Transport 2.4.0
  • NGO 1.11.0
  • Lobby 1.2.2
  • Multiplay 1.2.5
  • Matchmaker 1.1.5
  • Burst 1.8.18
  • Authentication 2.7.4
  • Unity services Core 1.13.0

Screenshot

Here you can see that there is no new Connection event happening after the last player closes the game by closing the browser. (10-30 seconds later the server kicks him out of the game, connectedClients = 0 after). And after that joining the game from Editor, webgl or any other instance does not work anymore
Screenshot 2024-11-07 at 09 46 43

Additional Context

See ZIP project.
NGO.zip

@kevinstriker kevinstriker added stat:awaiting triage Status - Awaiting triage from the Netcode team. type:bug Bug Report labels Nov 7, 2024
@kevinstriker kevinstriker changed the title Server gets confused when WebGL client is quitting the browser instead of using UI to exit game Server NetworkManager is stuck in a UpdateLoop() after last/only player closes the browser on WebGL Nov 8, 2024
@NoelStephensUnity NoelStephensUnity added the Investigating Issue is currently being investigated label Nov 8, 2024
@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Nov 8, 2024

@kevinstriker
My first step was to just see how everything worked running the dedicated server locally (had to comment out the MultiplayService.Instance.StartServerQueryHandlerAsync to do this) so I could just see it running using standard UDP via UTP.

The one thing that may or may not be causing issues is that it seems the player's are never removed from the lobby upon disconnecting (not sure if this is intentional or not).
image
The above screenshot was after connecting and disconnecting 4 times to the same session.
Of course, this is not using WebSockets but I wanted to make sure that using Relay, Lobby, and a locally running dedicated server functioned as expected prior to digging deeper.

The next test is to enable Websockets and see if I get similar results where the server continues to accept connections and if not then determine if this is a Websocket + Multiplayer services issue (or the like). From an NGO perspective, it looks like the server isn't running into any issues.

The "Failed to connect to server." message on the client side is basically saying that either:

  • The server is refusing the connection (which you don't have authentication enabled so it should automatically approve).
  • The client is timing out on the connection (i.e. it isn't getting a response).
    • Based on the UDP test working, this could be an issue on the Multiplayer Services side of things (potentially).
    • Your screenshot looks like the time from connecting to relay to the time it logs that message is around 60 seconds which very well could be the client timing out.

If using Websockets works running a dedicated server locally on my system, then I will run through the dedicated side of things to try and replicate the issue...and if so I will need to get the services folks involved to help troubleshoot/narrow down what the cause could be.

@NoelStephensUnity
Copy link
Collaborator

@kevinstriker
So, before I get too much further into this... I just noticed something:

The MainMenu (Client) scene contains a NetworkManager instance:
image

The Server scene contains a NetworkManager instance:
image

Then it appears the PrototypeMap (loaded by both of the above scenes) contains a NetworkManager instance:
image

You should only have 1 NetworkManager instance and loading a scene with a NetworkManager instance will override the NetworkManager.Singleton and I noticed the one in the PrototypeMap doesn't have Websockets enabled... this could potentially be the issue.

Can you replicate this issue if you remove the NetworkManager from the PrototypeMap?

@NoelStephensUnity NoelStephensUnity added the stat:awaiting response Status - Awaiting response from author. label Nov 8, 2024
@kevinstriker kevinstriker changed the title Server NetworkManager is stuck in a UpdateLoop() after last/only player closes the browser on WebGL Server NetworkManager does not accept new connections from new clients if ONLY/FIRST client in game closes browser (webgl) Nov 8, 2024
@kevinstriker
Copy link
Author

kevinstriker commented Nov 8, 2024

Hi @NoelStephensUnity

First of all BIG thanks for getting back to me.

Yes apologies, I do turn off the NetworkManager gameobject in the PrototypeMap.
The server gets his NetworkManager from the Server scene and the client indeed from only the MainMenu.

Yes you're correct, I don't remove the lobby / players from lobby properly (yet). Sorry about that, will deff add it later, since i'm just basically playtesting the game with a couple of friends it's not that big of an issue yet.

After digging for days and getting quite hopeless about what this issue might be caused by. I am really happy to get your response hahaha, that sounds a bit desperate but just know it's big time appreciated.

Tonight i digged some more and started logging in some NGO / Transport classes in the PackageCache.
It really seems that the moment the "only" / "first" player disconnects by closing the browser. Something prevents the server from accepting any new connection.

In case you have any questions really feel free to ask!

Kind regards,
Kevin

PS: sorry i hit the wrong button when commenting, i accidentlly closed instead of just commented, now re-open :)

@kevinstriker kevinstriker reopened this Nov 8, 2024
@NoelStephensUnity NoelStephensUnity self-assigned this Nov 8, 2024
@NoelStephensUnity NoelStephensUnity removed the stat:awaiting response Status - Awaiting response from author. label Nov 8, 2024
@NoelStephensUnity
Copy link
Collaborator

No worries... just wanted to make sure there wasn't something broken within the NGO SDK itself which seems to be the case so far. I will proceed to the dedicated server hosting side of things then.

One thing that I am going to check on... is how the session is being created...
Spin up a game by starting the WebGL game -> main menu -> Lobby Browser -> Create game

It seems the 1st player is what gets the server instance to spin up...and so there could be something on the service side that considers the session ended if the owner who started the game completely disconnects (i.e. browser closed).

From an NGO side, there is nothing that I am seeing that would be causing this... so I will most likely need to get someone from the cloud services side to take a look and determine if this is just a settings thing or the like.
(Might take a day or two to run through that process... will get back to you by early to mid next week)

@kevinstriker
Copy link
Author

kevinstriker commented Nov 8, 2024

Thanks @NoelStephensUnity !

Yeah completely understandable!

The game is a casual first person shooter for web. Games last 10 minutes and once spun up, the server should stay active for 10 minutes (the match time).

Game flow:

  1. The player opens the game and clicks "lobby browser"
  2. The player can join an existing game or create a new game
  3. When creating a new game, I'm using Matchmaker (this based on the unity docs) to simply get a ticket and after request a Multiplay server to spun up.
  4. After having a successful allocation and the server has spun up, the player joins it. The Game server (Multiplay) spins up Lobby and Relay, and these work together. After client joined the just created Lobby / Relay, it will call the StartClient() method on the NetworkManager and it joins the game.
  5. The UI for in-game pause (by hitting "ESC" button in-game, so on the prototype map :) ), shows a "Main Menu" button
  6. Leaving the game by using the Main Menu button will work properly. The game server will keep accepting new connections
  7. Leaving the game (and you're the only player) with closing the browser, will cause the connection manager to be confused.

While this flow might not be the final flow. It is however very common (and preferred) for web shooter games that the Server can exists for 10 minutes without it becoming "unjoinable”.

People will indeed leave during these sessions quite often, hopping between games etc. Making this small bug have quite some impact.

PS: i did think about closing the game when 0 players are in the game. However this only solves the problem partly. Since the bug happens right away after player leaving the browser, before the time out disconnect happens; meaning even with the check, there is a window of unjoinable games.

@kevinstriker
Copy link
Author

kevinstriker commented Nov 8, 2024

Sorry this is the correct order of scenes:
Screenshot 2024-11-08 at 23 35 39

Last note that can be interesting: changing the HeartbeatTimeoutMs actually makes the bug disappear for the window the HeartbeatTimeoutMs is set.

Example: default is 500ms, and i join like after 2 seconds or so of closing the browser, the bug happens and I can't join.
However, putting a strange high value like 20000, 20 seconds, and I close the browser (triggering normally the bug) and i join on another browser after 2 seconds or so, i can perfectly fine join.

This "workaround" did give other bugs down the line and is not a valid solution, but it might give you a good angle "where" to look.

@kevinstriker
Copy link
Author

kevinstriker commented Nov 11, 2024

@NoelStephensUnity i'm the whole week working regular office hours and also available in the evenings for any questions. So FYI if i wrote something that is unclear (english is not my native language), really feel free to ask questions when you take a look somewhere this week! Hopefully we can track this bug down :)

@NoelStephensUnity
Copy link
Collaborator

Leaving the game by using the Main Menu button will work properly. The game server will keep accepting new connections
Leaving the game (and you're the only player) with closing the browser, will cause the connection manager to be confused.

Ahhh... so when the last client disconnects by just closing the browser (i.e. non-graceful disconnect) it causes the issue to happen.
That helps... let me talk with some of the services folks to see if there is any known issues with that.

@kevinstriker
Copy link
Author

kevinstriker commented Nov 12, 2024

@NoelStephensUnity yes indeed!!! Yesss, non-graceful way was indeed the name i was looking for!!!

This way of closing the game happens quite often for web games.

I was thinking, I could just de-allocate the server when player count is 0 as a workaround.
Problem however is that the bug happens right after closing the browser, not after timeout disconnect event.
Meaning, me closing the browser as a last player on the server immediately makes the connectionmanager/server invalid for new connections. However the "last" player leaving is only kicked out 30 seconds later by timeout.
During this time, the server won't accept new connections, causing new players to see the time out / experiencing the bug. So de-allocation on 0 player count is not really a solution, leaves a xx second window still for the bug to happen, besides being far from ideal to close the game server on 0 players.

@NoelStephensUnity
Copy link
Collaborator

@kevinstriker
Yeah, that is a general issue across the board for ungraceful disconnects... you don't really know if a client has disconnected until it times out... now you can tweak the UnityTransport's "Disconnect Timeout MS" value to something less like say 10-15 seconds, but with WebGL you could potentially run into issues where it really didn't timeout but is just taking that long... so you might play with that value.

However, it seems odd that a new player cannot join even if the server still thinks the last player is connected...

Just to check, does this happen if say you join with 1 client and then close the browser or does it require more clients to join?

Also, have you set the NetworkManager Log Level to Developer and if so do you have the dedicated server log file where this scenario happened?
(Haven't had a chance to setup the dedicated server on my end yet so just looking to see if we have any additional information we can pull from in order to determine if this is service specific or NGO specific).

The only other thing I could think of would be to try reconnecting (after having ungracefully disconnected) in private browser mode to see if there are any cached values getting in the way of things...

Have raised this issue with the services group and once they have a chance to look over the issue and respond will let you know.

@kevinstriker
Copy link
Author

kevinstriker commented Nov 13, 2024

Hi @NoelStephensUnity

Yes i did put Developer log level!

It does not log the new connection.

It does happen in private browser mode. And also when i use another browser (also in private mode).

It does not need multiple players. If the first and only player leaves, non-graceful, the server no longer accepts new connections.

Changing the heartbeat value does not fix the corrupt state, it allows me to join, but many weird things happen after. Server crashes and if i leave before the crash it seems to think I was the host, instead of a client..., and many things go sideways. So the connection/session manager is still bugged. However i do get 1 step further that way.

Hopefully the service team is able to look soon! Unfortunately this issue is quite big for web based games using multiplay.

@NoelStephensUnity NoelStephensUnity added priority:high stat:import Status - Issue is going to be saved internally and removed stat:awaiting triage Status - Awaiting triage from the Netcode team. labels Nov 13, 2024
@michalChrobot michalChrobot added stat:imported Status - Issue is tracked internally at Unity and removed stat:import Status - Issue is going to be saved internally labels Nov 18, 2024
@kevinstriker
Copy link
Author

Hi @michalChrobot, i just see the new labels on this issue 😃 , thanks a lot for looking into this!!! Please feel free to ask any questions if something is unclear!

@michalChrobot
Copy link
Contributor

Hi, yup, we imported the issue to track it internally, I will try to keep you updated when we will have a progress on it

@kevinstriker
Copy link
Author

kevinstriker commented Nov 22, 2024

@michalChrobot okay thanks to let me know!

So the game is planned to release in January. On a game portal website.

However, since it's the game portal's tester that found this issue, and they mentioned it multiple times.
I'm not sure if my game is allowed to release before I'm able to mark the bugs found as completed.

Would it be possible to receive an estimate of when this will be looked into? A time frame would allow to plan / communicate properly :)

Kind regards,
Kevin

@kevinstriker
Copy link
Author

kevinstriker commented Dec 2, 2024

@NoelStephensUnity hi, is an indication possible when this issue will be looked into?

Unfortunately the game portal confirmed this bug has to be fixed before hosting it.

I would be very happy to use a workaround, but none of the workarounds i tried works / or does result in bigger bugs.
This makes me a bit nervous, and i am not sure what my best course of action is.

While mid January (expected release) is still 6 weeks away, with holidays coming up, would it still be safe to say that this issue will be looked into before?

@tristan-unity
Copy link

Hi @kevinstriker,

Thanks for reaching out and for your patience. We've moved your issue to the top of the multiplayer services team's backlog.

I'll keep you apprised with updates.

Cheers,
Tristan

@tristan-unity
Copy link

Hi @kevinstriker,

Could you please try the following changes?

In the StartRelay method of the Server.cs file, make these updates:

  • Change the RelayServerData protocol from "wss" to "dtls":
    RelayServerData relayServerData = new RelayServerData(allocation, "dtls");
  • Set UseWebSockets to false:
    NetworkManager.Singleton.GetComponent<UnityTransport>().UseWebSockets = false;

This should help resolve the re-connection issue.

video.mp4

Cheers,
Tristan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating Issue is currently being investigated priority:high stat:imported Status - Issue is tracked internally at Unity type:bug Bug Report
Projects
None yet
Development

No branches or pull requests

4 participants