Replies: 1 comment 3 replies
-
Hi @amirgon , does ArrowFlight uses UCT or UCP API?
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I would like to bring up an issue we are seeing with UCX when running Apache ArrowFlight over UCX.
I'm not sure if this is a bug in UCX, in ArrowFlight or in the way we use ArrowFlight with UCX, so I'm opening this for discussion.
The problem is a bit hard to reproduce, it only happens occasionally with very large amounts of data transferred over ArrowFlight with UCX/Posix-shmem transport.
When the problem happens, communication freezes and UCX shows these errors:
and also:
The last one is unusual since we use
UCX_POSIX_USE_PROC_LINK=n
and in such caseuct_posix_shm_open
should be called and notuct_posix_file_open
.(null)
appears in the file name sinceposix_config->dir
was null, as expected when usinguct_posix_shm_open
, howeverUCT_POSIX_SEG_FLAG_SHM_OPEN
was unexpectedly read as 0 on that specific segment although it should have been 1 for all segments.Digging into this further, we found out that the issue was related to memory pool corruption, specifically
mm_recv_desc
pool.The corruption caused elements from the allocated list to be linked to the free list, which caused eventually the errors above.
The reason
mm_recv_desc
got corrupted was that it was used from multiple threads.To support zero-copy, the received buffers were released only when they reached their final destination on a different thread, so the allocating thread and the releasing thread are different, while UCX mpool is not thread safe.
To fix that we added a spinlock on ucx mpool functions
ucs_mpool_get_inline
anducs_mpool_add_to_freelist
, and this seems to resolve the issue without impacting performance. (If this is a valid solution, I can create a PR)After this fix the issue doesn't block us any more, however, there are still open questions:
Beta Was this translation helpful? Give feedback.
All reactions