Replies: 2 comments 3 replies
-
Thanks for sharing the success, as well as the challenges and solutions you found. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Very cool usecase, and thanks a lot for the writeup! I would definitely watch a conference talk on this subject, and I suspect many Python conferences would be happy to have it - if you are interested in submitting such a thing! |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Everyone,
I've successfully managed to make and release a game using micropython!
I'd like to present some of the challenges I had to deal with and how I solved them, in hopes it will help others, or at least provide some insight into how micropython can be used outside of it's intended purpose.
What:
About 15 years ago my company made about 6 PC games using Ogre3D engine and cpython for scripting. Not the best idea in hindsight, but it helped us make these games much faster then by doing them in C++.
In 2024 I remastered the first of these and released it on many platforms: PC, Mac, Linux, iOS and Android.
It even runs on 32 bit Windows XP and old PowerPC Macs (OSX 10.4+)!
Why micropython:
I tried embedding cpython in my game engine but it was sooo horribly complicated I almost gave up. At that point I found micropython, which is much more simpler and portable.
I made a few experiments using micropython's 'unix' port and managed to implement bidirectional communication between C++ and Python code.
Then ported that to all platforms I need to publish for and everyhting worked quite well!
API:
I exposed all of the API of Ogre3D engine that I needed for these games in the same way as the original api was made (Python-Ogre project)
By 'exposed' I really mean I manually wrote all api wrappers. Didn't want to automate it because I'm using maybe 1% of Ogre's API anyway.
Problem 1 - Garbage collection:
Python's garbage collection is a problem for games because they have to have low latency. This is especially a problem for fast games where any frame drops can be quite noticable.
Unfortunately, micropython's garbage collection is much much slower than cpython's (in my experience).
The way I understand micropython's GC by studying the code is that it needs to go ever the entire heap, mark the used pointers, and then free the rest of them.
There doesn't seem to be a way to do partial GC so I can spread it across frames:
Solution:
So to solve this, I moved all python execution code to a separate thread, syncronizing it with the main thread with binary semaphores, using the "producer/consumer" method.
This basically means that my game update thread goes through the usual updates, and when it reaches the time to update python, it pauses and waits until the python thread does it's thing.
Once the python thread is over, my main thread continues and updates Ogre3D engine audio and other non-python segments.
Afther the python update call and parallel to my other updates → the python thread runs garbage collection at various intervals.
This ensures enough time for GC to run, it can even break Vsync a bit without affecting my other threads.
This only works on reasonably fast devices.
Problem 2 - When to run GC:
I use the 'split heap' feature, meaning I allocate about 1MB of heap ram, and then micropython allocates more heaps if GC fails to procure enough space.
The problem here is, that the more heaps I have, the more time GC takes! And in split heap mode GC takes more time than a larger single heap.
So to solve this, I hacked into the gc calls and modified them with an enum and call my functions. My functions then allow or dissalow GC to continue.
For example, if GC is invoked because micropython can't allocate enough space, then I let it through, but if only a GC threshold is hit, then I block it and run GC manually on the python thread earlier.
Still, depending on the game and where in the game a player is, sometimes I get a lot of GC calls that must pass which forces to run GC when I don't want it and adds another micropython heap which is then difficult to get rid of because of memory fragmentation.
So I set up an arbitraty GC threshold at startup, and tweak it up and down whenever I get a GC call I can't block. After a few such situations GC stabilizes and I get almost no GC calls I can't avoid.
This can be avoided by running GC opon every update but that consumes too much power, which is a problem for phones and laptops because the device heats up, so I have to be smart about it.
Problem 3 - High FPS devices:
In this first game, my GC calls ran anywhere between 1 millisecond and 20 milliseconds on very slow devices.
Most devices run at 60 FPS but high end devices go up as high as 120-144 Hz which is a problem.
So, to solve this, I run python code at a reduced FPS and interpolate positions of objects in between frames.
On regular devices, both the game and python runs at 60FPS
On Slow devices, python runs at 30 FPS, while the game aims for 60
On high end devices with high FPS I run the game at 40-60 FPS, interporating the rest up to 120-144Hz
This is not a great option if you need very low-latency input, but for my games, this works quite well.
So while GC runs on the python thread, my main update thread is free to run and interpolates object positions wihout blocking the python thread and vice-versa. This also reduces device heating.
Problem 4 - Heap size:
Heap size is important, as said before, the bigger your micropython heap, the slower python seems to run and the slower GC runs.
Single-heap mode is the fastest but if you run out of heap space - game over 🙂
Split-heap is the best, if not the only option, but it adds additional latency in python runtime and GC time.
I spent a lot of time balancing how much heap to allocate initially, and it's really quite individual for each game.
Next, I had to move some code to C++ to avoid creating large lists and maps in python.
Heap fragmentation is a big problem, especially in split-heap mode. if you allocate an object which has a long life, and it ends up in another heap ,that heap won't be removed as long as that one object lives.
Also, Micropython allocates new heap by doubling the size of the previous one, which is not ideal for my use case, so I hacked that code to allocate new heaps which are the same size as the initial heap.
Another interesting fact is that heap block size is tied to the CPU register size. so if you have a 64 bit CPU, you'll need a heap twice as large as on a 32 bit cpu.
So in my case I'm targeting 32 bit winxp, 32 bit powerpc and 32bit arm for raspberry pi systems, and I can get away allocating twice as less heap size which imporves python and GC speed on these devices. And since 32 bit devices tend to be slow these days, you need all the edge you can get 🙂
So, here you go, hope this helps someone, feel free to ask any questions, I'd be glad to answer!
If you want to check out the game in question, click here.
Beta Was this translation helpful? Give feedback.
All reactions