Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Deadlock when concurrently updating a hypertable linked to ordinary tables (might be linked to #5342 ) #7429

Open
timopulkkinen opened this issue Nov 7, 2024 · 8 comments
Labels

Comments

@timopulkkinen
Copy link

timopulkkinen commented Nov 7, 2024

What type of bug is this?

Locking issue

What subsystems and features are affected?

Query executor

What happened?

We are hitting a deadlock when we are inserting data concurrently in a database structure similar to the example below.

The issue occurs pretty much instantly when concurrency is introduced. If we drop triggers or foreign references the deadlock does not occur. It seems that this issue might be related to #5342 , but needs further analysis.

I created a shell script to replicate the issue. The script creates the database and structure below and runs concurrent testing. In my tests the deadlock occurs 100% time.

https://github.com/timopulkkinen/tsdb-concurrent-deadlock

TimescaleDB version affected

2.17.2

PostgreSQL version used

16

What operating system did you use?

MacOS 14.6.1 ARM

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

ERROR:  deadlock detected
DETAIL:  Process 7373 waits for ShareUpdateExclusiveLock on relation 61615 of database 61545; blocked by process 7375.
Process 7375 waits for ShareRowExclusiveLock on relation 61571 of database 61545; blocked by process 7374.
Process 7374 waits for ShareUpdateExclusiveLock on relation 61615 of database 61545; blocked by process 7373.

How can we reproduce the bug?

1. Clone the Github repo
2. Run the test script (set PGUSER, PGPASSWORD and other relevant env vars before)
3. The error should manifest itself.
@timopulkkinen timopulkkinen changed the title [Bug]: Deadlock when concurrently updating a hypertable linked to ordinary tables (might be linked to #5432 ) [Bug]: Deadlock when concurrently updating a hypertable linked to ordinary tables (might be linked to #5342 ) Nov 7, 2024
@akuzm
Copy link
Member

akuzm commented Nov 8, 2024

Thanks for the script, I can reproduce the problem.

@akuzm
Copy link
Member

akuzm commented Nov 8, 2024

deadlock-test.sh.gz

Slightly simplified the script. Might be related to this issue we have on a private repo: https://github.com/timescale/eng-database/issues/619

@akuzm
Copy link
Member

akuzm commented Nov 8, 2024

Seems to be related to creating the FK constraint for a new chunk:

2024-11-08 14:38:36.832 CET [3185162] akuzm@deadlock LOG:  00000: statement: BEGIN; SELECT simulate(id, (10 + random() * 90)::numeric) FROM (SELECT id FROM main_entity WHERE id % 3 = 1) e; COMMIT;
2024-11-08 14:38:36.832 CET [3185162] akuzm@deadlock LOCATION:  exec_simple_query, postgres.c:2273
2024-11-08 14:38:36.832 CET [3185163] akuzm@deadlock LOG:  00000: statement: BEGIN; SELECT simulate(id, (10 + random() * 90)::numeric) FROM (SELECT id FROM main_entity WHERE id % 3 = 2) e; COMMIT;
2024-11-08 14:38:36.832 CET [3185163] akuzm@deadlock LOCATION:  exec_simple_query, postgres.c:2273

/*
 * RowExclusive lock taken at parsing stage for INSERT/UPDATE/DELETE.
 * Likely for the main statement INSERT INTO op1 (main_entity_id, value, status) VALUES (_main_id, _value, 'created');
 * This conflicts with mode 6 ShareRowExclusive.
 */
[3185163] lock relation op1 mode 3
LockRelationOid+0A7 <- RangeVarGetRelidExtended+1FB <- relation_openrv_extended+02E <- table_openrv_extended+00B <- parserOpenTable+039 <- setTargetTable+059 <- transformStmt+E2F <- transformTopLevelStmt+0CC <- parse_analyze_withcb+051 <- pg_analyze_and_rewrite_withcb+064 <- _SPI_prepare_plan+141 <- SPI_prepare_extended+0C3 <- exec_prepare_plan+053 <- exec_stmt_execsql+0C6 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecScan+14A <- standard_ExecutorRun+12B <- PortalRunSelect+113 <- 

/*
 * New chunk created for op1_event_data. ts_chunk_create_for_point serializes
 * chunk creation around a ShareUpdateExclusiveLock (4) on main table to
 * avoid multiple processes trying to create the same chunk.
 */
[3185162] lock relation op1_event_data mode 4
LockRelationOid+0A7 <- ts_chunk_create_for_point+035 <- ts_hypertable_create_chunk_for_point+041 <- ts_chunk_dispatch_get_chunk_insert_state+158 <- chunk_dispatch_exec+35D <- hypertable_modify_exec+254 <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecResult+13D <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_run_select+0CE <- exec_stmts+13E4 <- exec_stmt_block+3D0 <- plpgsql_exec_trigger+1BA <- plpgsql_call_handler+103 <- ExecCallTriggerFunc+125 <- afterTriggerInvokeEvents+7E9 <- AfterTriggerEndQuery+098 <- standard_ExecutorFinish+109 <- _SPI_execute_plan+659 <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecScan+14A <- standard_ExecutorRun+12B <- PortalRunSelect+113 <- 

/*
 * Same as above, waits.
 */
[3185163] lock relation op1_event_data mode 4
LockRelationOid+0A7 <- ts_chunk_create_for_point+035 <- ts_hypertable_create_chunk_for_point+041 <- ts_chunk_dispatch_get_chunk_insert_state+158 <- chunk_dispatch_exec+35D <- hypertable_modify_exec+254 <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecResult+13D <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_run_select+0CE <- exec_stmts+13E4 <- exec_stmt_block+3D0 <- plpgsql_exec_trigger+1BA <- plpgsql_call_handler+103 <- ExecCallTriggerFunc+125 <- afterTriggerInvokeEvents+7E9 <- AfterTriggerEndQuery+098 <- standard_ExecutorFinish+109 <- _SPI_execute_plan+659 <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecScan+14A <- standard_ExecutorRun+12B <- PortalRunSelect+113 <- 

/*
 * Creating a constraint on new chunk locks the referenced table with mode 6 (ShareRowExclusive).
 *
 * This deadlocks with mode 3 RowExclusive.
 */
 spi execute ' ALTER TABLE _timescaledb_internal._hyper_1_2_chunk ADD CONSTRAINT "2_4_op1_event_data_parent_fk" FOREIGN KEY (parent_id) REFERENCES public.op1(id) '
[3185162] lock relation op1 mode 6
LockRelationOid+0A7 <- RangeVarGetRelidExtended+1FB <- relation_openrv+026 <- table_openrv+00B <- ATExecAddConstraint+162 <- ATRewriteCatalogs+1E28 <- ATController+0A9 <- ProcessUtilitySlow+759 <- standard_ProcessUtility+706 <- timescaledb_ddl_command_start+23C <- _SPI_execute_plan+416 <- SPI_execute_extended+0EF <- exec_stmts+A72 <- exec_stmts+2614 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- fmgr_security_definer+21A <- OidFunctionCall1Coll+055 <- create_non_dimensional_constraint+0DC <- ts_chunk_constraints_create+05A <- chunk_create_table_constraints+015 <- chunk_create_from_hypercube_after_lock+135 <- ts_chunk_create_for_point+989 <- ts_hypertable_create_chunk_for_point+041 <- ts_chunk_dispatch_get_chunk_insert_state+158 <- chunk_dispatch_exec+35D <- hypertable_modify_exec+254 <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecResult+13D <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_run_select+0CE <- exec_stmts+13E4 <- exec_stmt_block+3D0 <- plpgsql_exec_trigger+1BA <- plpgsql_call_handler+103 <- ExecCallTriggerFunc+125 <- afterTriggerInvokeEvents+7E9 <- AfterTriggerEndQuery+098 <- standard_ExecutorFinish+109 <- _SPI_execute_plan+659 <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecScan+14A <- standard_ExecutorRun+12B <- PortalRunSelect+113 <- 

mybt+04A <- errfinish+0A7 <- DeadLockReport+1EE <- WaitOnLock+1A4 <- LockAcquireExtended+503 <- LockRelationOid+164 <- ts_chunk_create_for_point+035 <- ts_hypertable_create_chunk_for_point+041 <- ts_chunk_dispatch_get_chunk_insert_state+158 <- chunk_dispatch_exec+35D <- hypertable_modify_exec+254 <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecResult+13D <- standard_ExecutorRun+12B <- _SPI_execute_plan+60C <- SPI_execute_plan_with_paramlist+0B4 <- exec_run_select+0CE <- exec_stmts+13E4 <- exec_stmt_block+3D0 <- plpgsql_exec_trigger+1BA <- plpgsql_call_handler+103 <- ExecCallTriggerFunc+125 <- afterTriggerInvokeEvents+7E9 <- AfterTriggerEndQuery+098 <- standard_ExecutorFinish+109 <- _SPI_execute_plan+659 <- SPI_execute_plan_with_paramlist+0B4 <- exec_stmt_execsql+173 <- exec_stmts+1B4 <- exec_stmt_block+3D0 <- plpgsql_exec_function+289 <- plpgsql_call_handler+131 <- ExecInterpExpr+57A <- ExecScan+14A <- standard_ExecutorRun+12B <- PortalRunSelect+113 <- 
2024-11-08 14:38:41.529 CET [3185163] akuzm@deadlock DETAIL:  Process 3185163 waits for ShareUpdateExclusiveLock on relation 27905067 of database 27904298; blocked by process 3185162.
	Process 3185162 waits for ShareRowExclusiveLock on relation 27905050 of database 27904298; blocked by process 3185163.
	Process 3185163: BEGIN; SELECT simulate(id, (10 + random() * 90)::numeric) FROM (SELECT id FROM main_entity WHERE id % 3 = 2) e; COMMIT;
	Process 3185162: BEGIN; SELECT simulate(id, (10 + random() * 90)::numeric) FROM (SELECT id FROM main_entity WHERE id % 3 = 1) e; COMMIT;

2024-11-08 14:38:41.529 CET [3185163] akuzm@deadlock CONTEXT: 
	SQL statement "INSERT INTO op1_event_data (parent_id, main_entity_id, status) VALUES (_id, _main_entity_id, _new_status)"

	PL/pgSQL function update_op1_status(bigint,bigint,text) line 3 at SQL statement

	SQL statement "SELECT update_op1_status(NEW.id, NEW.main_entity_id, NEW.status)"

	PL/pgSQL function update_op1_and_transactions() line 3 at PERFORM

	SQL statement "INSERT INTO op1 (main_entity_id, value, status) VALUES (_main_id, _value, 'created')"

	PL/pgSQL function simulate(bigint,numeric) line 3 at SQL statement

@akuzm
Copy link
Member

akuzm commented Nov 8, 2024

It doesn't seem to happen if the inserts go to an already created chunk, e.g. if I run one simulate to completion before running the concurrent ones.

@timopulkkinen
Copy link
Author

Noted the same behaviour. As a temp workaround is there a way to pre-create chunks before inserting data?

@jannehietamaki
Copy link

@akuzm Hi, any idea if there would be a workaround for this?

@omaristalis
Copy link

omaristalis commented Nov 14, 2024

Saw the same issue a few days ago on a timescale.com hosted database:

image

@timopulkkinen
Copy link
Author

Any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants