Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Segfault when updating a row in a hypertable #7497

Open
timopulkkinen opened this issue Nov 28, 2024 · 10 comments
Open

[Bug]: Segfault when updating a row in a hypertable #7497

timopulkkinen opened this issue Nov 28, 2024 · 10 comments
Assignees
Labels

Comments

@timopulkkinen
Copy link

timopulkkinen commented Nov 28, 2024

What type of bug is this?

Crash

What subsystems and features are affected?

Compression, Query executor

What happened?

Hi,

experiencing a segfault when updating a freshly created row in a hypertable chunked by row creation time (different transactions for create and update). If creation time is set to current time, the crash does not reproduce.

Table schema ( a bit redacted):

create table a (
  id bigserial not null
  foreign_key_aa bigint not null references aa,
  foreign_key_ab int not null references ab,
  foreign_key_ac text not null references ac,
  id2 bigint,
  reference text not null,
  value int,
  ...
  created_at timestamptz not null default now(),
  created_at_date timestamptz not null default date_trunc('day', now() AT TIME ZONE 'UTC'),
  updated_at timestamptz not null default now(),
  unique(foreign_key_ab, external_id_a, created_at_date),
  PRIMARY KEY (id, created_at_date)
);

SELECT create_hypertable('a', 'created_at_date', chunk_time_interval => INTERVAL '1 week');

ALTER TABLE a SET (
  timescaledb.compress,
  timescaledb.compress_orderby = 'created_at_date, reference, id',
  timescaledb.compress_segmentby = 'foreign_key_aa, foreign_key_ab, foreign_key_ac, id2'
);

SELECT add_compression_policy('a', INTERVAL '6 months');

CREATE INDEX a_foreign_key_aa_idx ON a(foreign_key_aa);
CREATE INDEX a_id2 ON a(id2);
CREATE INDEX a_foreign_key_ab_idx ON ab(game_integration_id, reference);

Query:

update a set value = 0  where id = y

Note: as you know, the update in this form will cause a complete scan of all chunks of the table, but still a segfault shouldn't happen.

There is also a trigger bound to the update event that spawns updates in other tables.

TimescaleDB version affected

2.17.2

PostgreSQL version used

17.0.1 (Percona builds)

What operating system did you use?

Ubuntu 24.04 x64

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

Stack trace:

#0  0x00007fa712802a96 in list_nth (n=<optimized out>, list=0x5f9602895610) at /usr/include/postgresql/17/server/nodes/pg_list.h:302
No locals.
#1  ExecInitUpdateProjection (mtstate=mtstate@entry=0x5f9602481ba0, resultRelInfo=resultRelInfo@entry=0x5f960290ed00) at /usr/src/timescaledb-2-postgresql-17-2.17.2~ubuntu24.04/src/nodes/hypertable_modify.c:1249
        node = 0x5f96024a8bc0
        subplan = 0x5f96024a8cc8
        estate = 0x5f9602480ce8
        relDesc = 0x7fa816ef5580
        whichrel = <optimized out>
        updateColnos = <optimized out>
#2  0x00007fa71280dcd8 in ExecModifyTable (pstate=<optimized out>, cs_node=<optimized out>) at /usr/src/timescaledb-2-postgresql-17-2.17.2~ubuntu24.04/src/nodes/hypertable_modify.c:937
        resultoid = <optimized out>
        estate = <optimized out>
        subplanstate = 0x5f96024c0a38
        oldSlot = <optimized out>
        relinfos = 0x0
        resultRelInfo = 0x5f960290ed00
        oldtuple = <optimized out>
        cds = <optimized out>
        ht_state = <optimized out>
        node = <optimized out>
        context = {mtstate = 0x5f9602481ba0, epqstate = 0x5f9602481c88, estate = 0x5f9602480ce8, planSlot = 0x5f9602c51288, GetUpdateNewTuple = 0x5f960316e8c8, relaction = 0x0, tmfd = {ctid = {ip_blkid = {
                bi_hi = 3304, bi_lo = 584}, ip_posid = 24470}, xmax = 38439176, cmax = 24470, traversed = 208}, cpUpdateRetrySlot = 0x5f95ffda47a3 <ExecInitNode+707>, cpUpdateReturningSlot = 0x5f96031a4328, 
          lockmode = 993560320}
        operation = <optimized out>
        slot = <optimized out>
        tupleid = <optimized out>
        oldtupdata = {t_len = 2652288640, t_self = {ip_blkid = {bi_hi = 32765, bi_lo = 0}, ip_posid = 50090}, t_tableOid = 24469, t_data = 0x5f96024819f8}
        tuple_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 25}
        lc = <optimized out>
        ht_state = <optimized out>
        node = <optimized out>
        context = <optimized out>
        estate = <optimized out>
        operation = <optimized out>
        resultRelInfo = <optimized out>
        subplanstate = <optimized out>
        slot = <optimized out>
        oldSlot = <optimized out>
        tupleid = <optimized out>
        tuple_ctid = <optimized out>
        oldtupdata = <optimized out>
        oldtuple = <optimized out>
        relinfos = <error reading variable relinfos (Cannot access memory at address 0x0)>
        lc = <optimized out>
        cds = <optimized out>
        __func__ = <optimized out>
        __errno_location = <optimized out>
        __errno_location = <optimized out>
        resultoid = <optimized out>
        datum = <optimized out>
        isNull = <optimized out>
        __errno_location = <optimized out>
        relkind = <optimized out>
        datum = <optimized out>
        isNull = <optimized out>
        __errno_location = <optimized out>
        __errno_location = <optimized out>
        relation = <optimized out>
        __errno_location = <optimized out>
        __errno_location = <optimized out>
        lc__state = <optimized out>
#3  hypertable_modify_exec (node=<optimized out>) at /usr/src/timescaledb-2-postgresql-17-2.17.2~ubuntu24.04/src/nodes/hypertable_modify.c:171
        mtstate = <optimized out>
#4  0x00005f95ffd9e3cd in ExecProcNode (node=0x5f9602481720) at /usr/src/percona-postgresql-17-2:17.0-1.noble/build/../src/include/executor/executor.h:274
No locals.
#5  ExecutePlan (execute_once=<optimized out>, dest=0x5f960046d460 <donothingDR.lto_priv.0>, direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_UPDATE, use_parallel_mode=<optimized out>, 
    planstate=0x5f9602481720, estate=0x5f9602480ce8) at /usr/src/percona-postgresql-17-2:17.0-1.noble/build/../src/backend/executor/execMain.c:1654
        slot = <optimized out>
        current_tuple_count = 0
        slot = <optimized out>
        current_tuple_count = <optimized out>
#6  standard_ExecutorRun (queryDesc=0x5f96023e10d8, direction=<optimized out>, count=0, execute_once=<optimized out>) at /usr/src/percona-postgresql-17-2:17.0-1.noble/build/../src/backend/executor/execMain.c:365
        estate = 0x5f9602480ce8
        operation = CMD_UPDATE
        dest = 0x5f960046d460 <donothingDR.lto_priv.0>
        sendTuples = <optimized out>
        oldcontext = 0x5f96023e0f10
        __func__ = "standard_ExecutorRun"
#7  0x00005f95fff8c5e5 in ExecutorRun (execute_once=true, count=0, direction=ForwardScanDirection, queryDesc=0x5f96023e10d8)
    at /usr/src/percona-postgresql-17-2:17.0-1.noble/build/../src/backend/executor/execMain.c:306

Log output:

2024-11-27 16:42:48.941 UTC [1206] LOG:  server process (PID 2175) was terminated by signal 11: Segmentation fault
2024-11-27 16:42:48.941 UTC [1206] DETAIL:  Failed process was running: update "a" set "value" = $1 where "id" = $2


### How can we reproduce the bug?


This is an overview, might not reproduce the issue, but a generic description what triggered the issue on my side:

- Approximate table schema is included in the description. 
- Precreate chunks by inserting rows for each day in the past year
- Insert a row with created_at_date = '2024-09-06T00:00:00.000Z'
- Try to update the row.

@timopulkkinen
Copy link
Author

Changed UPDATE to INSERT... ON CONFLICT DO UPDATE and the segfault does not happen anymore.

@renci-zz
Copy link

renci-zz commented Dec 3, 2024

I have the same problem, and I drop database and recreate database, reinstall timescaledb, the issue still, finally I reinstall operation system and pgsql, timescaledb, the issue does not happen.

@erimatnor erimatnor self-assigned this Dec 4, 2024
@erimatnor
Copy link
Contributor

@timopulkkinen I created the following test script in order to try and reproduce the issue:

create table aa (id bigint primary key);
create table ab (id int primary key, reference text);
create table ac (id text primary key);

insert into aa values (1);
insert into ab values (1, 'test');
insert into ac values ('1');

create table a (
  id bigserial not null,
  foreign_key_aa bigint not null references aa,
  foreign_key_ab int not null references ab,
  foreign_key_ac text not null references ac,
  id2 bigint,
  reference text not null,
  value int,
  created_at timestamptz not null default now(),
  created_at_date timestamptz not null default date_trunc('day', now() AT TIME ZONE 'UTC'),
  updated_at timestamptz not null default now(),
  unique(foreign_key_ab, id2, created_at_date),
  PRIMARY KEY (id, created_at_date)
);

SELECT create_hypertable('a', 'created_at_date', chunk_time_interval => INTERVAL '1 week');

ALTER TABLE a SET (
  timescaledb.compress,
  timescaledb.compress_orderby = 'created_at_date, reference, id',
  timescaledb.compress_segmentby = 'foreign_key_aa, foreign_key_ab, foreign_key_ac, id2'
);


insert into a (foreign_key_aa, foreign_key_ab, foreign_key_ac, id2, reference, value, created_at, created_at_date, updated_at)
select 1, 1, '1', (random()*50)::int, 'test', (random()*30)::int, t, date_trunc('day', t), t
from generate_series(timestamptz '2024-01-01 00:00:00',
                  timestamptz '2024-12-31 00:00:00',
             interval '1 day') t;

insert into a values (1, 1, 1, '1', 1, 'test', 1, '2024-01-01', '2024-01-01', '2024-01-01');

CREATE INDEX a_foreign_key_aa_idx ON a(foreign_key_aa);
CREATE INDEX a_id2 ON a(id2);
CREATE INDEX a_foreign_key_ab_idx ON ab(id, reference);


--SELECT add_compression_policy('a', INTERVAL '6 months');
select count(compress_chunk(ch)) from show_chunks('a') ch;
update a set value = 0  where id = 1;

However, I can't get it to crash. Maybe there is something missing? Can you try the script on your environment or try to provide a test setup that reproduces the crash?

@redbaron
Copy link

redbaron commented Dec 5, 2024

1.16.1 and 15.9 Postgres:

(gdb) bt
#0  0x00007945c27b6c1b in list_nth (n=<optimized out>, list=0x5a591a709b28) at /usr/include/postgresql/15/server/nodes/pg_list.h:281
#1  ExecInitUpdateProjection (mtstate=mtstate@entry=0x5a591a4b7148, resultRelInfo=resultRelInfo@entry=0x5a591a619a58) at ./src/nodes/hypertable_modify.c:1186
#2  0x00007945c27b91d3 in ExecModifyTable (pstate=<optimized out>, cs_node=<optimized out>) at ./src/nodes/hypertable_modify.c:875
#3  hypertable_modify_exec (node=<optimized out>) at ./src/nodes/hypertable_modify.c:171
#4  0x00005a59186a6c73 in ExecProcNode (node=0x5a591a4b6cd8) at ./build/../src/include/executor/executor.h:262
#5  ExecutePlan (execute_once=<optimized out>, dest=0x5a5918bfaaa0 <donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_UPDATE, 
    use_parallel_mode=<optimized out>, planstate=0x5a591a4b6cd8, estate=0x5a591a4b6838) at ./build/../src/backend/executor/execMain.c:1645
#6  standard_ExecutorRun (queryDesc=0x5a591a58ee48, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:366
#7  0x00007945cc70243d in pgss_ExecutorRun (queryDesc=0x5a591a58ee48, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
    at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1010
#8  0x00007945cc6ee6fa in explain_ExecutorRun (queryDesc=0x5a591a58ee48, direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
    at ./build/../contrib/auto_explain/auto_explain.c:320
#9  0x00005a591881ea69 in ProcessQuery (plan=0x5a591a6dba18, 
    sourceText=0x5a591a58ec38 "UPDATE event SET identity_id = $1 WHERE identity_id = $2 AND \"name\" <> '_identityCreate'", params=0x5a591a58ed18, queryEnv=0x0, 
    dest=0x5a5918bfaaa0 <donothingDR>, qc=0x7ffce43e98c0) at ./build/../src/backend/tcop/pquery.c:160
#10 0x00005a591881f679 in PortalRunMulti (portal=portal@entry=0x5a591a4487b8, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, 
    dest=0x5a5918bfaaa0 <donothingDR>, dest@entry=0x5a591a38d178, altdest=0x5a5918bfaaa0 <donothingDR>, altdest@entry=0x5a591a38d178, qc=qc@entry=0x7ffce43e98c0)
    at ./build/../src/backend/tcop/pquery.c:1277
#11 0x00005a591881fb3e in PortalRun (portal=portal@entry=0x5a591a4487b8, count=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, 
    dest=dest@entry=0x5a591a38d178, altdest=altdest@entry=0x5a591a38d178, qc=<optimized out>) at ./build/../src/backend/tcop/pquery.c:791
#12 0x00005a591881d21a in exec_execute_message (max_rows=<optimized out>, portal_name=0x5a591a38cc88 "") at ./build/../src/backend/tcop/postgres.c:2230
#13 PostgresMain (dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4747
#14 0x00005a5918799ff1 in BackendRun (port=0x5a591a3dcce0) at ./build/../src/backend/postmaster/postmaster.c:4514
#15 BackendStartup (port=0x5a591a3dcce0) at ./build/../src/backend/postmaster/postmaster.c:4242
#16 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1809
#17 0x00005a591879afc5 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x5a591a34ada0) at ./build/../src/backend/postmaster/postmaster.c:1481
#18 0x00005a591850029b in main (argc=3, argv=0x5a591a34ada0) at ./build/../src/backend/main/main.c:202

@timopulkkinen
Copy link
Author

timopulkkinen commented Dec 5, 2024

@erimatnor Modified your script a bit and was able to reproduce the issue in our environment. Notably the issue didn't occur with update a set value = 0 where id = 1.

create table aa (id bigint primary key);
create table ab (id int primary key, reference text);
create table ac (id text primary key);


insert into aa values (1);
insert into ab values (1, 'test');
insert into ac values ('1');

create table a (
  id bigserial not null,
  foreign_key_aa bigint not null references aa,
  foreign_key_ab int not null references ab,
  foreign_key_ac text not null references ac,
  id2 bigint,
  reference text not null,
  value int,
  created_at timestamptz not null default now(),
  created_at_date timestamptz not null default date_trunc('day', now() AT TIME ZONE 'UTC'),
  updated_at timestamptz not null default now(),
  unique(foreign_key_ab, id2, created_at_date),
  PRIMARY KEY (id, created_at_date)
);


SELECT create_hypertable('a', 'created_at_date', chunk_time_interval => INTERVAL '1 week');

ALTER TABLE a SET (
  timescaledb.compress,
  timescaledb.compress_orderby = 'created_at_date, reference, id',
  timescaledb.compress_segmentby = 'foreign_key_aa, foreign_key_ab, foreign_key_ac, id2'
);


insert into a (foreign_key_aa, foreign_key_ab, foreign_key_ac, id2, reference, value, created_at, created_at_date, updated_at)
select 1, 1, '1', (random()*50)::int, 'test', (random()*30)::int, t, date_trunc('day', t), t
from generate_series(timestamptz '2025-01-01 00:00:00',
                  timestamptz '2025-12-31 00:00:00',
             interval '1 day') t;

CREATE INDEX a_foreign_key_aa_idx ON a(foreign_key_aa);
CREATE INDEX a_id2 ON a(id2);
CREATE INDEX a_foreign_key_ab_idx ON ab(id, reference);

select count(compress_chunk(ch)) from show_chunks('a') ch;

update a set value = 0  where id = 30;

@erimatnor
Copy link
Contributor

@timopulkkinen I still can't get the script to cause a crash on my local setup. I tried TimescaleDB 2.16.1 on both PG15.0 and PG17.2. I also tried TimescaleDB 2.17.1 on those setups. I tried different IDs in the update.

@renci-zz
Copy link

renci-zz commented Dec 6, 2024

My data is from old timescaleDB version update to new version (2.14 to 2.15.3 or 2.15.3 to 2.16, I don't remember), and export dump , import dump to new server. and my system is hyper-v virtual machine, also used export and import to install.
Maybe it's related to this.

@timopulkkinen
Copy link
Author

timopulkkinen commented Dec 6, 2024

@erimatnor Tried this with the TimescaleDB Docker image locally and wasn't able to make it crash either, not even when I tried to replicate the config that we have on the actual server.

However, with the following Dockerfile and configuration the crash occurs at least for me locally.

In the Dockerfile platform is set to amd64.

crash_docker.zip

@timopulkkinen
Copy link
Author

timopulkkinen commented Dec 10, 2024

@erimatnor I debugged this further - but first, I noticed my Docker package was a bit half-baked. Included a better version.
better_docker.zip

Anyways, I was able to pinpoint the issue to be with 17.0.1 version of Postgres (latest Percona release). When I downgraded to 16.x, the segfault issue disappeared. The same goes for TimescaleDB Docker image, which runs 17.2.

I rebuilt our cluster using Postgres maintained packages for 17.2 and ... and as expected, no segfault anymore.

@kpan2034
Copy link
Contributor

@timopulkkinen when I ran the script with the Docker package you provided, I got a segfault once, and the following error on multiple other attempts to replicate it:

ERROR: targetColnos does not match subplan target list

I'll look into it further.

@fabriziomello fabriziomello added segfault Segmentation fault hypertable labels Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants