-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Core Dump #746
Comments
If you build tilemaker with debug symbols (add Or, you could run tilemaker on your data with gdb attached until you get a repro - the debugger will (hopefully!) stop on an assert. Or, if it's possible to share repro instructions, I could see if I can repro it. I do see that you say it doesn't happen reliably, but I'm hoping that a "soak test" of just constantly re-building the map might show it. Could you share all the necessary inputs? e.g. what version of tilemaker are you using, what is your config.json/process.lua file, and what is your input PBF file? |
I'm on Version 3.0.0 I'm running regular rebuilds once a week ) of the UK and Ireland PBFs from http://download.geofabrik.de/europe/united-kingdom-latest.osm.pbf I logged this when the ireland build failed Last weekend the UK build failed with a core dump this morning both completed OK |
Thanks! I've been able to reproduce intermittent segfaults on the current master (#760 (comment)), I'll see if I can sort out what's going on. I suspect passing |
As it currently takes about 12 minutes to do both including downloading the files I might try it. But for the other tilesets I suspect that might be problematical Sat 21 Sep 2024 00:30:01 UTC Getting UK Data Set |
bug 1: PooledString resizes `vector` without locks `tables` is a shared pool of `char*` pointers, where each pointer points to a 64KB memory chunk. Some `PooledString`s identify their content by an index into this pool. However, `tables` can grow. We correctly guard against concurrent mutation (for example, here: https://github.com/systemed/tilemaker/blob/7f0343045687ab2125910c81eed598c58fc2ff2d/src/pooled_string.cpp#L33-L39) But readers expect to be able to read it without a lock, for example here, where the result of a read will be used to do a write: https://github.com/systemed/tilemaker/blob/7f0343045687ab2125910c81eed598c58fc2ff2d/src/pooled_string.cpp#L54 This pattern isn't safe with `vector`, since when the `vector` grows, it invalidates all existing pointers. It is safe with `deque`, so the fix is to switch to a `deque`. bug 2: vector layer metadata `map` isn't guarded `layers` is a shared object common to all OsmLuaProcessing threads. `layers.layers` is a `vector` that gets initialized and populated fully on the main thread before the Lua threads start, so accessing it without locks is fine. `layers.layers[layer].attributeMap` is just a vanilla `map`, though, so mutating it from multiple threads without locks is dangerous. I just added a coarse lock for now. On my 16-core machine, it didn't seem to introduce contention, so I didn't bother to do anything fancier to minimize locking overhead. I will optimistically say that this fixes systemed#746.
I've got scheduled jobs running to create several sets of maps ( one at a time).
Occasionally I get seg faults and core dumps - its not always on a specific data set, and if I re-run the process it completes OK.
The output from the process looks like this
Store size 74G | 1/6 Block 51929/51930 (44388 ms)
Store size 87G | 2/6 Block 51928/51930 (248660 ms)
Store size 99G | 3/6 Block 51928/51930 (156323 ms)
Store size 111G | 4/6 Block 51929/51930 (233455 ms)
Store size 131G | 5/6 Block 19581/51930 Segmentation fault (core dumped)
Sun 01 Sep 2024 06:00:55 UTC Completed EU processing
Any pointers on what I can do to investigate these would be great
The text was updated successfully, but these errors were encountered: