Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction hang with tx::errc::concurrent_transactions #23656

Open
blindspotbounty opened this issue Oct 7, 2024 · 3 comments
Open

Transaction hang with tx::errc::concurrent_transactions #23656

blindspotbounty opened this issue Oct 7, 2024 · 3 comments
Labels
kind/bug Something isn't working

Comments

@blindspotbounty
Copy link

Version & Environment

Redpanda version: (use rpk version): v24.2.4

What went wrong?

At some point redpanda start returning tx::errc::concurrent_transactions for new transactional producer.
We tried to restart redpanda but unfortunately that doesn't help:

WARN  2024-10-07 15:12:20,115 [shard 0:main] tx - tx_gateway_frontend.cc:2158 - [tx_id=process-identifier-assignment+process-identifier-status] commit_tx on consumer groups etag: 10 pid: {producer_identity: id=7189, epoch=2} tx_seq: 7 status: preparing_commit in term: 24 was rejected
WARN  2024-10-07 15:12:20,115 [shard 0:main] tx - tx_gateway_frontend.cc:2223 - [tx_id=process-identifier-assignment+process-identifier-status] remote commit etag: 10 pid: {producer_identity: id=7189, epoch=2} tx_seq: 7 in term: 24 rejected
WARN  2024-10-07 15:12:20,115 [shard 0:main] tx - tx_gateway_frontend.cc:2452 - [tx_id=process-identifier-assignment+process-identifier-status] error progressing transaction: {id: process-identifier-assignment+process-identifier-status, status: preparing_commit, pid: {producer_identity: id=7189, epoch=2}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 7, partitions: {ntp: {kafka/process-identifier-status/0}, etag: 1, revision: 6439}} - tx::errc::request_rejected
WARN  2024-10-07 15:12:20,115 [shard 0:main] tx - tx_gateway_frontend.cc:978 - [tx_id=process-identifier-assignment+process-identifier-status] error getting transaction metadata: tx::errc::concurrent_transactions

What should have happened instead?

Transactional producer should be overriding previous instances (at least after restart)

Other approach would be to have some force-delete for transactions using rpk tool.

How to reproduce the issue?

I am not sure about exact scenario however it might be related to removed topics that were used within this transaction/consumer related to transaction.

Additional information

We saved the storage for this instance and can provide more logs/data if needed with required layers.

Previously we used a small timeout for transactions eviction (~10 mins). But after this case #22670 we decided to remove this value (especially for old instances).

@blindspotbounty blindspotbounty added the kind/bug Something isn't working label Oct 7, 2024
@bharathv
Copy link
Contributor

bharathv commented Oct 7, 2024

Do you see any warnings with text "group.cc" around this time on any of the brokers?

@blindspotbounty
Copy link
Author

blindspotbounty commented Oct 8, 2024

@bharathv yes, I see this:

WARN  2024-10-07 15:12:20,114 [shard 0:kafk] tx - [N:process-identifier-assignment S:Stable G:3] group.cc:1712 - commit_tx request: {ntp {kafka/__consumer_offsets/2} pid {producer_identity: id=7189, epoch=2} tx_seq 7 group_id process-identifier-assignment timeout 2000} failed - producer not found

@bharathv
Copy link
Contributor

bharathv commented Oct 8, 2024

Thanks, we have a plausible theory for this, I need to try and reproduce this in a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants