Skip to content

Releases: pganalyze/collector

v0.40.0

30 Jun 17:09
290466d
Compare
Choose a tag to compare
  • Update to pg_query_go v2.0.4
    • Normalize: Don't touch "GROUP BY 1" and "ORDER BY 1" expressions, keep original text
    • Fingerprint: Cache list item hashes to fingerprint complex queries faster
      (this change also significantly reduces memory usage for complex queries)
  • Install script: Support CentOS in addition to RHEL

v0.39.0

01 Jun 07:49
Compare
Choose a tag to compare
  • Docker: Use Docker's USER command to set user, to support running as non-root
    • This enables the collector container to run in environments that require the
      whole container to run as a non-root user, which previously was not the case.
    • For compatibility reasons the container can still be run as root explicitly,
      in which case the setpriv command is used to drop privileges. setpriv replaces
      gosu since its available for installation in most distributions directly, and
      fulfills the same purpose here.
  • Selfhosted: Support running log discovery with non-localhost db_host settings
    • Previously this was prevented by a fixed check against localhost/127.0.0.1,
      but sometimes one wants to refer to the local server by a non-local IP address
  • AWS: Add support for AssumeRoleWithWebIdentity
  • Statement stats retrieval: Get all rows first, before fingerprinting queries
    • This avoids showing a bogus ClientWrite event on the Postgres server side whilst
      the collector is running the fingerprint method. There is a trade-off here,
      because we now need to retrieve all statement texts (for the full snapshot) before
      doing the fingerprint, leading to a slight increase in memory usage. Nonetheless,
      this improves debuggability, and avoids bogus statement timeout issues.
  • Track additional meta information about guided setup failures
  • Fix reporting of replication statistics for more than 1 follower

v0.38.1

03 Apr 17:32
Compare
Choose a tag to compare
  • Update to pg_query_go 2.0.2
    • Normalize: Fix handling of two subsequent DefElems (resolves rare crashes)
  • Redact primary_conninfo setting if present and readable
    • This can contain sensitive information (full connection string to the
      primary), and pganalyze does not do anything with it right now. In the
      future, we may partially redact this and use primary hostname
      information, but for now, just fully redact it.

v0.38.0

31 Mar 21:26
3288b13
Compare
Choose a tag to compare
  • Update to pg_query 2.0 and Postgres 13 parser
    • This is a major upgrade in terms of supported syntax (Postgres 10 to 13),
      as well as a major change in the fingerprints, which are now shorter and
      not compatible with the old format.
    • When you upgrade to this version of the collector you will see a break
      in statistics
      , that is, you will see new query entries in pganalyze after
      adopting this version of the collector.
  • Amazon RDS: Support long log events beyond 2,000 lines
    • Resolves edge cases where very long EXPLAIN plans would be ignored since
      they exceeded the previous 2,000 limit
    • We now ensure that we go back up to 10 MB in the file with each log
      download that happens, with support for log events that exceed the RDS API
      page size limit of 10,000 log lines
  • Self-managed: Also check for the process name "postmaster" when looking for
    Postgres PID (fixes data directory detection for RHEL-based systems)

v0.37.1

17 Mar 05:29
0594fb7
Compare
Choose a tag to compare
  • Docker builds: Increase stack size to 2MB to prevent rare crashes
    • Alpine has a very small stack size by default (80kb) which is less than
      the default that Postgres expects (100kb). Since there is no good reason
      to reduce it to such a small amount, increase to usually common Linux
      default of 2MB stack size.
    • This would have surfaced as a hard crash of the Docker container with
      error code 137 or 139, easily confused with out of memory errors, but
      clearly distinct from it.
  • Reduce timeout for accessing EC2 instance metadata service
    • Previously we were re-using our shared HTTP client, which has a rather
      high timeout (120 seconds) that causes the HTTP client to wait around
      for a long time. This is generally intentional (since it includes the
      time spent downloading a request body), but is a bad idea when running
      into EC2's IDMSv2 service that has a network-hop based limit. If that
      hop limit is exceeded, the requests just go to nowhere, causing the
      client to wait for a multiple of 120 seconds (~10 minutes were observed).
  • Don't use pganalyze query marker for "--test-explain" command
    • The marker means the resulting query gets hidden from the EXPLAIN plan
      list, which is what we don't want for this test query - it's intentional
      that we can see the EXPLAIN plan we're generating for the test.

v0.37.0

19 Feb 19:19
Compare
Choose a tag to compare
  • Add support for receiving logs from remote servers over syslog
    • You can now specify the new "db_log_syslog_server" config setting, or
      "LOG_SYSLOG_SERVER" environment variable in order to setup the collector
      as a syslog server that can receive logs from a remote server via syslog
      to the server that runs the collector.
    • Note that the format of this setting is "listen_address:port", and its
      recommended to use a high port number to avoid running the collector as root.
    • For example, you can specify "0.0.0.0:32514" and then send syslog messages
      to the collector's server address at port 32514.
    • Note that you need to use protocol RFC5424, with an unencrypted TCP
      connection. Due to syslog not being an authenticated protocol it is
      recommended to only use this integration over private networks.
  • Add support for "pid=%p,user=%u,db=%d,app=%a,client=%h " and
    "user=%u,db=%d,app=%a,client=%h " log_line_prefix settings
    • This prefix misses a timestamp, but is useful when sending data over syslog.
  • Log parsing: Correctly handle %a containing commas/square brackets
    • Note that this does not support all cases since Go's regexp engine
      does not support negative lookahead, so we can't handle an application
      name containing a comma if the log_line_prefix has a comma following %a.
  • Ignore CSV log files in log directory #83
    • Some Postgres installations are configured to log both standard-format
      log files and CSV log files to the same directory, but the collector
      currently reads all files specified in a db_log_location, which works
      poorly with this setup.
  • Tweak collector sample config file to match setup instructions
  • Improvements to "--discover-log-location"
    • Don't keep running if there's a config error
    • Drop the log_directory helper command and just fetch the setting from Postgres
    • Warn and only show relative location if log_directory is inside
      the data directory (this requires special setup steps to resolve)
  • Improvements to "--test-logs"
    • Run privilege drop test when running log test as root, to allow running
      "--test-logs" for a complete log setup test, avoiding the need to run
      a full "--test"
  • Update pg_query_go to incorporate memory leak fixes
  • Check whether pg_stat_statements exists in a different schema, and give a
    clear error message
  • Drop support for Postgres 9.2
    • Postgres 9.2 has been EOL for almost 4 years
  • Update to Go 1.16
    • This introduces a change to Go's certificate handling, which may break
      certain older versions of Amazon RDS certificates, as they do not
      include a SAN. When this is the case you will see an error message like
      "x509: certificate relies on legacy Common Name field".
    • As a temporary workaround you can run the collector with the
      GODEBUG=x509ignoreCN=0 environment setting, which ignores these incorrect
      fields in these certificates. For a permanent fix, you need to update
      your RDS certificates to include the correct SAN field: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL-certificate-rotation.html

v0.36.0

22 Jan 06:22
7e92fbe
Compare
Choose a tag to compare
  • Config parsing improvements:
    • Fail fast when pganalyze section is missing in config file
    • Ignore duplicates in db_name config setting
      • Previously this could cause malformed snapshots that would be submitted
        correctly but could not be processed
    • Validate db_url parsing to avoid collector crash with invalid URLs
  • Include pganalyze-collector-setup program (see 0.35 release notes) in supported packages
  • Rename <unidentified queryid> query text placeholder to <query text unavailable>
    • This makes it clearer what the underlying issue is
  • Revert to using <truncated query> instead of <unparsable query> in some situations
    • When a query is cut off due to pg_stat_activity limit being reached,
      show <truncated query>, to make it clear that increasing track_activity_query_size
      would solve the issue
  • Ignore I/O stats for AWS Aurora utility statements
    • AWS Aurora appears to report incorrect blk_read_time and blk_write_time values
      for utility statements (i.e., non-SELECT/INSERT/UPDATE/DELETE); we zero these out for now
  • Fix log-based EXPLAIN bug where query samples could be dropped if EXPLAIN failed
  • Add U140 log event (inconsistent range bounds)
    • e.g.: ERROR: range lower bound must be less than or equal to range upper bound
  • Fix issue where incomplete schema information in snapshots was not marked correctly
    • This could lead to schema objects disappearing and being re-created
  • Fix trailing newline handling for GCP and self-hosted log streams
    • This could lead to queries being poorly formatted in the UI, or some queries
      with single-line comments being ignored
  • Include additional collector configuration settings in snapshot metadata for diagnostics
  • Ignore "insufficient privilege" queries w/o queryid
    • Previously, these could all be aggregated together yielding misleading stats

v0.35.0

06 Dec 04:28
Compare
Choose a tag to compare
  • Add new "pganalyze-collector-setup" program that streamlines collector installation
    • This is initially targeted for self-managed servers to make it easier to set up
      the collector and required configuration settings for a locally running Postgres
      server
    • To start, this supports the following environments:
      • Postgres 10 and newer, running on the same server as the collector
      • Ubuntu 14.04 and newer
      • Debian 10 and newer
  • Collector test: Show server URLs to make it easier to access the servers in
    pganalyze after the test
  • Collector test+reload: In case of errors, return exit code 1
  • Ignore manual vacuums if the collector can't access pg_stat_progress_vacuum
  • Don't run log test for Heroku, instead provide info message
    • Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '"
      error on Heroku Postgres
  • Add pganalyze system user to adm group in Debian/Ubuntu packages
    • This gives the collector permission to read Postgres log files in a default
      install, simplifying Log Insights setup
  • Handle NULL parameters for query samples correctly
  • Add a skip_if_replica / SKIP_IF_REPLICA option (#117)
    • You can use this to configure the collector in a no-op mode on
      replicas (we only query if the monitored database is a replica), and
      automatically switch to active monitoring when the database is no
      longer a replica.
  • Stop building packages for CentOS 6 and Ubuntu 14.04 (Trusty)
    • Both of these systems are now end of life, and the remaining survivor
      of the CentOS 6 line (Amazon Linux 1) will be EOL on December 31st 2020.

v0.34.0

08 Nov 04:51
Compare
Choose a tag to compare
  • Check and report problematic log collection settings
    • Some Postgres settings almost always cause a drastic increase in log
      volume for little actual benefit. They tend to cause operational problems
      for the collector (due to the load of additional log parsing) and the
      pganalyze service itself (or indeed, likely for any service that would
      process collector snapshots), and do not add any meaningful insights.
      Furthermore, we found that these settings are often turned on
      accidentally.
    • To avoid these issues, add some client-side checks in the collector to
      disable log processing if any of the problematic settings are on.
    • The settings in question are:
    • If any of these are set to these unsupported values, all log collection will be
      disabled for that server. The settings are re-checked every full snapshot, and can be
      explicitly re-checked with a collector reload.
  • Log Insights improvements
    • Self-managed server: Process logs every 3 seconds, instead of on-demand
    • Self-managed server: Improve handling of multi-line log events
    • Google Cloud SQL: Always acknowledge Pub Sub messages, even if collector doesn't handle them
    • Optimize stitching logic for reduced CPU consumption
    • Explicitly close temporary files to avoid running out of file descriptors
  • Multiple changes to improve debugging in support situations
    • Report collector config in full snapshot
      • This reports certain collector config settings (except for passwords/keys/credentials)
        to the pganalyze servers to help with debugging.
    • Print collector version at beginning of test for better support handling
    • Print collection status and Postgres version before submitting snapshots
    • Change panic stack trace logging from Verbose to Warning
  • Add support for running the collector on ARM systems
    • Note that we don't provide packages yet, but with this the collector
      can be built on ARM systems without any additional patches.
  • Introduce API system scope fallback
    • This fallback is intended to allow changing the API scope, either based
      on user configuration (e.g. moving the collector between different
      cloud provider accounts), or because of changes in the collector identify
      system logic.
    • The new "api_system_scope_fallback" / PGA_API_SYSTEM_SCOPE_FALLBACK config
      variable is intended to be set to the old value of the scope. When the
      pganalyze backend receives a snapshot with a fallback scope set, and there
      is no server created with the regular scope, it will first search the
      servers with the fallback scope. If found, that server's scope will be
      updated to the (new) regular scope. If not found, a new server will be
      created with the regular scope. The main goal of the fallback scope is to
      avoid creating a duplicate server when changing the scope value
  • Use new fallback scope mechanism to change scope for RDS databases
    • Previously we identified RDS databases by their ID and region only, but
      the ID does not have to be unique within a region, it only has to be
      unique within the same AWS account in that region. Thus, adjust the
      scope to include both the region and AWS Account ID (if configured or
      auto-detected), and use the fallback scope mechanism to migrate existing
      servers.
  • Add support for GKE workload identity Yash Bhutwala #91
  • Add support for assuming AWS instance roles
    • Set the role to be assumed using the new aws_assume_role / AWS_ASSUME_ROLE
      configuration setting. This is useful when the collector runs in a different
      AWS account than your database.

v0.33.1

11 Sep 15:50
Compare
Choose a tag to compare
  • Ignore internal admin databases for GCP and Azure
    • This avoids collecting data from these internal databases, which produces
      unnecessary errors when using the all databases setting.
  • Add log_line_prefix check to GCP self-test
  • Schema stats handling: Avoid crash due to nil pointer dereference
  • Add support for "%m [%p]: [%l-1] db=%d,user=%u " log_line_prefix