Releases: pganalyze/collector
Releases · pganalyze/collector
v0.40.0
- Update to pg_query_go v2.0.4
- Normalize: Don't touch "GROUP BY 1" and "ORDER BY 1" expressions, keep original text
- Fingerprint: Cache list item hashes to fingerprint complex queries faster
(this change also significantly reduces memory usage for complex queries)
- Install script: Support CentOS in addition to RHEL
v0.39.0
- Docker: Use Docker's USER command to set user, to support running as non-root
- This enables the collector container to run in environments that require the
whole container to run as a non-root user, which previously was not the case. - For compatibility reasons the container can still be run as root explicitly,
in which case the setpriv command is used to drop privileges. setpriv replaces
gosu since its available for installation in most distributions directly, and
fulfills the same purpose here.
- This enables the collector container to run in environments that require the
- Selfhosted: Support running log discovery with non-localhost db_host settings
- Previously this was prevented by a fixed check against localhost/127.0.0.1,
but sometimes one wants to refer to the local server by a non-local IP address
- Previously this was prevented by a fixed check against localhost/127.0.0.1,
- AWS: Add support for AssumeRoleWithWebIdentity
- This is useful when running the collector inside EKS in order to access
AWS resources, as recommended by AWS: https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html
- This is useful when running the collector inside EKS in order to access
- Statement stats retrieval: Get all rows first, before fingerprinting queries
- This avoids showing a bogus ClientWrite event on the Postgres server side whilst
the collector is running the fingerprint method. There is a trade-off here,
because we now need to retrieve all statement texts (for the full snapshot) before
doing the fingerprint, leading to a slight increase in memory usage. Nonetheless,
this improves debuggability, and avoids bogus statement timeout issues.
- This avoids showing a bogus ClientWrite event on the Postgres server side whilst
- Track additional meta information about guided setup failures
- Fix reporting of replication statistics for more than 1 follower
v0.38.1
- Update to pg_query_go 2.0.2
- Normalize: Fix handling of two subsequent DefElems (resolves rare crashes)
- Redact primary_conninfo setting if present and readable
- This can contain sensitive information (full connection string to the
primary), and pganalyze does not do anything with it right now. In the
future, we may partially redact this and use primary hostname
information, but for now, just fully redact it.
- This can contain sensitive information (full connection string to the
v0.38.0
- Update to pg_query 2.0 and Postgres 13 parser
- This is a major upgrade in terms of supported syntax (Postgres 10 to 13),
as well as a major change in the fingerprints, which are now shorter and
not compatible with the old format. - When you upgrade to this version of the collector you will see a break
in statistics, that is, you will see new query entries in pganalyze after
adopting this version of the collector.
- This is a major upgrade in terms of supported syntax (Postgres 10 to 13),
- Amazon RDS: Support long log events beyond 2,000 lines
- Resolves edge cases where very long EXPLAIN plans would be ignored since
they exceeded the previous 2,000 limit - We now ensure that we go back up to 10 MB in the file with each log
download that happens, with support for log events that exceed the RDS API
page size limit of 10,000 log lines
- Resolves edge cases where very long EXPLAIN plans would be ignored since
- Self-managed: Also check for the process name "postmaster" when looking for
Postgres PID (fixes data directory detection for RHEL-based systems)
v0.37.1
- Docker builds: Increase stack size to 2MB to prevent rare crashes
- Alpine has a very small stack size by default (80kb) which is less than
the default that Postgres expects (100kb). Since there is no good reason
to reduce it to such a small amount, increase to usually common Linux
default of 2MB stack size. - This would have surfaced as a hard crash of the Docker container with
error code 137 or 139, easily confused with out of memory errors, but
clearly distinct from it.
- Alpine has a very small stack size by default (80kb) which is less than
- Reduce timeout for accessing EC2 instance metadata service
- Previously we were re-using our shared HTTP client, which has a rather
high timeout (120 seconds) that causes the HTTP client to wait around
for a long time. This is generally intentional (since it includes the
time spent downloading a request body), but is a bad idea when running
into EC2's IDMSv2 service that has a network-hop based limit. If that
hop limit is exceeded, the requests just go to nowhere, causing the
client to wait for a multiple of 120 seconds (~10 minutes were observed).
- Previously we were re-using our shared HTTP client, which has a rather
- Don't use pganalyze query marker for "--test-explain" command
- The marker means the resulting query gets hidden from the EXPLAIN plan
list, which is what we don't want for this test query - it's intentional
that we can see the EXPLAIN plan we're generating for the test.
- The marker means the resulting query gets hidden from the EXPLAIN plan
v0.37.0
- Add support for receiving logs from remote servers over syslog
- You can now specify the new "db_log_syslog_server" config setting, or
"LOG_SYSLOG_SERVER" environment variable in order to setup the collector
as a syslog server that can receive logs from a remote server via syslog
to the server that runs the collector. - Note that the format of this setting is "listen_address:port", and its
recommended to use a high port number to avoid running the collector as root. - For example, you can specify "0.0.0.0:32514" and then send syslog messages
to the collector's server address at port 32514. - Note that you need to use protocol RFC5424, with an unencrypted TCP
connection. Due to syslog not being an authenticated protocol it is
recommended to only use this integration over private networks.
- You can now specify the new "db_log_syslog_server" config setting, or
- Add support for "pid=%p,user=%u,db=%d,app=%a,client=%h " and
"user=%u,db=%d,app=%a,client=%h " log_line_prefix settings- This prefix misses a timestamp, but is useful when sending data over syslog.
- Log parsing: Correctly handle %a containing commas/square brackets
- Note that this does not support all cases since Go's regexp engine
does not support negative lookahead, so we can't handle an application
name containing a comma if the log_line_prefix has a comma following %a.
- Note that this does not support all cases since Go's regexp engine
- Ignore CSV log files in log directory #83
- Some Postgres installations are configured to log both standard-format
log files and CSV log files to the same directory, but the collector
currently reads all files specified in a db_log_location, which works
poorly with this setup.
- Some Postgres installations are configured to log both standard-format
- Tweak collector sample config file to match setup instructions
- Improvements to "--discover-log-location"
- Don't keep running if there's a config error
- Drop the log_directory helper command and just fetch the setting from Postgres
- Warn and only show relative location if log_directory is inside
the data directory (this requires special setup steps to resolve)
- Improvements to "--test-logs"
- Run privilege drop test when running log test as root, to allow running
"--test-logs" for a complete log setup test, avoiding the need to run
a full "--test"
- Run privilege drop test when running log test as root, to allow running
- Update pg_query_go to incorporate memory leak fixes
- Check whether pg_stat_statements exists in a different schema, and give a
clear error message - Drop support for Postgres 9.2
- Postgres 9.2 has been EOL for almost 4 years
- Update to Go 1.16
- This introduces a change to Go's certificate handling, which may break
certain older versions of Amazon RDS certificates, as they do not
include a SAN. When this is the case you will see an error message like
"x509: certificate relies on legacy Common Name field". - As a temporary workaround you can run the collector with the
GODEBUG=x509ignoreCN=0 environment setting, which ignores these incorrect
fields in these certificates. For a permanent fix, you need to update
your RDS certificates to include the correct SAN field: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL-certificate-rotation.html
- This introduces a change to Go's certificate handling, which may break
v0.36.0
- Config parsing improvements:
- Fail fast when pganalyze section is missing in config file
- Ignore duplicates in db_name config setting
- Previously this could cause malformed snapshots that would be submitted
correctly but could not be processed
- Previously this could cause malformed snapshots that would be submitted
- Validate db_url parsing to avoid collector crash with invalid URLs
- Include pganalyze-collector-setup program (see 0.35 release notes) in supported packages
- Rename
<unidentified queryid>
query text placeholder to<query text unavailable>
- This makes it clearer what the underlying issue is
- Revert to using
<truncated query>
instead of<unparsable query>
in some situations- When a query is cut off due to pg_stat_activity limit being reached,
show<truncated query>
, to make it clear that increasing track_activity_query_size
would solve the issue
- When a query is cut off due to pg_stat_activity limit being reached,
- Ignore I/O stats for AWS Aurora utility statements
- AWS Aurora appears to report incorrect blk_read_time and blk_write_time values
for utility statements (i.e., non-SELECT/INSERT/UPDATE/DELETE); we zero these out for now
- AWS Aurora appears to report incorrect blk_read_time and blk_write_time values
- Fix log-based EXPLAIN bug where query samples could be dropped if EXPLAIN failed
- Add U140 log event (inconsistent range bounds)
- e.g.: ERROR: range lower bound must be less than or equal to range upper bound
- Fix issue where incomplete schema information in snapshots was not marked correctly
- This could lead to schema objects disappearing and being re-created
- Fix trailing newline handling for GCP and self-hosted log streams
- This could lead to queries being poorly formatted in the UI, or some queries
with single-line comments being ignored
- This could lead to queries being poorly formatted in the UI, or some queries
- Include additional collector configuration settings in snapshot metadata for diagnostics
- Ignore "insufficient privilege" queries w/o queryid
- Previously, these could all be aggregated together yielding misleading stats
v0.35.0
- Add new "pganalyze-collector-setup" program that streamlines collector installation
- This is initially targeted for self-managed servers to make it easier to set up
the collector and required configuration settings for a locally running Postgres
server - To start, this supports the following environments:
- Postgres 10 and newer, running on the same server as the collector
- Ubuntu 14.04 and newer
- Debian 10 and newer
- This is initially targeted for self-managed servers to make it easier to set up
- Collector test: Show server URLs to make it easier to access the servers in
pganalyze after the test - Collector test+reload: In case of errors, return exit code 1
- Ignore manual vacuums if the collector can't access pg_stat_progress_vacuum
- Don't run log test for Heroku, instead provide info message
- Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '"
error on Heroku Postgres
- Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '"
- Add pganalyze system user to adm group in Debian/Ubuntu packages
- This gives the collector permission to read Postgres log files in a default
install, simplifying Log Insights setup
- This gives the collector permission to read Postgres log files in a default
- Handle NULL parameters for query samples correctly
- Add a skip_if_replica / SKIP_IF_REPLICA option (#117)
- You can use this to configure the collector in a no-op mode on
replicas (we only query if the monitored database is a replica), and
automatically switch to active monitoring when the database is no
longer a replica.
- You can use this to configure the collector in a no-op mode on
- Stop building packages for CentOS 6 and Ubuntu 14.04 (Trusty)
- Both of these systems are now end of life, and the remaining survivor
of the CentOS 6 line (Amazon Linux 1) will be EOL on December 31st 2020.
- Both of these systems are now end of life, and the remaining survivor
v0.34.0
- Check and report problematic log collection settings
- Some Postgres settings almost always cause a drastic increase in log
volume for little actual benefit. They tend to cause operational problems
for the collector (due to the load of additional log parsing) and the
pganalyze service itself (or indeed, likely for any service that would
process collector snapshots), and do not add any meaningful insights.
Furthermore, we found that these settings are often turned on
accidentally. - To avoid these issues, add some client-side checks in the collector to
disable log processing if any of the problematic settings are on. - The settings in question are:
- log_min_duration_statement less than 10ms
- log_statement set to 'all'
- log_duration set to 'on'
- log_error_verbosity set to 'verbose'
- If any of these are set to these unsupported values, all log collection will be
disabled for that server. The settings are re-checked every full snapshot, and can be
explicitly re-checked with a collector reload.
- Some Postgres settings almost always cause a drastic increase in log
- Log Insights improvements
- Self-managed server: Process logs every 3 seconds, instead of on-demand
- Self-managed server: Improve handling of multi-line log events
- Google Cloud SQL: Always acknowledge Pub Sub messages, even if collector doesn't handle them
- Optimize stitching logic for reduced CPU consumption
- Explicitly close temporary files to avoid running out of file descriptors
- Multiple changes to improve debugging in support situations
- Report collector config in full snapshot
- This reports certain collector config settings (except for passwords/keys/credentials)
to the pganalyze servers to help with debugging.
- This reports certain collector config settings (except for passwords/keys/credentials)
- Print collector version at beginning of test for better support handling
- Print collection status and Postgres version before submitting snapshots
- Change panic stack trace logging from Verbose to Warning
- Report collector config in full snapshot
- Add support for running the collector on ARM systems
- Note that we don't provide packages yet, but with this the collector
can be built on ARM systems without any additional patches.
- Note that we don't provide packages yet, but with this the collector
- Introduce API system scope fallback
- This fallback is intended to allow changing the API scope, either based
on user configuration (e.g. moving the collector between different
cloud provider accounts), or because of changes in the collector identify
system logic. - The new "api_system_scope_fallback" / PGA_API_SYSTEM_SCOPE_FALLBACK config
variable is intended to be set to the old value of the scope. When the
pganalyze backend receives a snapshot with a fallback scope set, and there
is no server created with the regular scope, it will first search the
servers with the fallback scope. If found, that server's scope will be
updated to the (new) regular scope. If not found, a new server will be
created with the regular scope. The main goal of the fallback scope is to
avoid creating a duplicate server when changing the scope value
- This fallback is intended to allow changing the API scope, either based
- Use new fallback scope mechanism to change scope for RDS databases
- Previously we identified RDS databases by their ID and region only, but
the ID does not have to be unique within a region, it only has to be
unique within the same AWS account in that region. Thus, adjust the
scope to include both the region and AWS Account ID (if configured or
auto-detected), and use the fallback scope mechanism to migrate existing
servers.
- Previously we identified RDS databases by their ID and region only, but
- Add support for GKE workload identity Yash Bhutwala #91
- Add support for assuming AWS instance roles
- Set the role to be assumed using the new
aws_assume_role
/AWS_ASSUME_ROLE
configuration setting. This is useful when the collector runs in a different
AWS account than your database.
- Set the role to be assumed using the new
v0.33.1
- Ignore internal admin databases for GCP and Azure
- This avoids collecting data from these internal databases, which produces
unnecessary errors when using the all databases setting.
- This avoids collecting data from these internal databases, which produces
- Add log_line_prefix check to GCP self-test
- Schema stats handling: Avoid crash due to nil pointer dereference
- Add support for "%m [%p]: [%l-1] db=%d,user=%u " log_line_prefix