Skip to content

Crockford Base32 encoding for PostgreSQL unsigned integers

License

Notifications You must be signed in to change notification settings

grzm/pgcrockford

Repository files navigation

pgcrockford - Crockford Base32 encoding for PostgreSQL unsigned integers

The pgcrockford extension provides Base 32 encoded integers as a PostgreSQL base type. From Douglas Crockford's requirements for the specification, the encoding should

  • Be human readable and machine readable.
  • Be compact. Humans have difficulty in manipulating long strings of arbitrary symbols.
  • Be error resistant. Entering the symbols must not require keyboarding gymnastics.
  • Be pronounceable. Humans should be able to accurately transmit the symbols to other humans using a telephone.

The motivation for pgcrockford is to provide a more humane display of values used for surrogate keys while preserving the efficiency of an integer implementation: crockford values are represented as alpha-numeric strings while using unsigned integers for their implementation.

Releases and Dependencies

Latest release: 0.8.34

Tested with PostgreSQL 11.

Installation

make
make install
make installcheck
CREATE SCHEMA crockford;
CREATE EXTENSION crockford WITH SCHEMA crockford;

Note: The extension doesn't require a dedicated schema, but it's good practice to utilize schemas to namespace modules.

Usage

The pgcrockford extension provides 3 base types: 2-byte, 4-byte, and 8-byte implementations.

  • crockford2 (2-byte)
  • crockford4 (4-byte)
  • crockford8 (8-byte)
-- interpret '10' as a crockford literal
SELECT '10'::crockford.crockford4;
-- 10

-- cast integer 10 to crockford
SELECT 10::crockford.crockford4;
-- A
CREATE TABLE store.widgets (
  widget_id crockford.crockford4 GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  widget_name text UNIQUE NOT NULL
);

INSERT INTO store.widgets (widget_name)
  SELECT 'widget' || n FROM generate_series(1,64) AS _ (n);

SELECT * FROM store.widgets ORDER BY widget_id DESC LIMIT 10;
/*
 widget_id | widget_name
-----------+-------------
 20        | widget64
 1Z        | widget63
 1Y        | widget62
 1X        | widget61
 1W        | widget60
 1V        | widget59
 1T        | widget58
 1S        | widget57
 1R        | widget56
 1Q        | widget55
(10 rows)
*/

Getting funky with representation

One of the drawbacks of using sequences as surrogate keys is that there's nothing to distinguish an id value of one table from an id value of another. If both store.widget and store.locations have id surrogate key columns, is 1232 a widget_id or a location_id? Setting aside debates of whether one should use surrogate keys or column naming conventions, you can leverage the alpha-numeric representation of the crockford types to encode the type in the id value.

CREATE FUNCTION store.next_widget_id()
RETURNS crockford.crockford4
LANGUAGE sql AS
$body$
  SELECT 'W00000' + nextval('store.widgets_widget_id_seq');
$body$;

CREATE FUNCTION store.next_location_id()
RETURNS crockford.crockford4
LANGUAGE sql AS
$body$
  SELECT 'L00000' + nextval('store.locations_location_id_seq');
$body$;

These functions will provide 33,554,431 unique values each (W00001..WZZZZZ and L00001..LZZZZZ respectively) before encroaching on the encoded "type".

Operators and search_path

If you load crockford into a schema that's not on the search_path, the crockford-specific operators won't be available without specifying the schema explicitly. This is normal behavior for PostgreSQL, but may be suprising if you haven't seen it before.

SELECT 'A'::crockford.crockford4 + 1;
-- 11

-- specify the schema explicitly using OPERATOR
SELECT 'A'::crockford.crockford4 OPERATOR(crockford.+) 1;
-- B

SET search_path TO crockford;

SELECT 'A'::crockford4 + 1;
-- B

If you're using crockford a lot, you'll likely want to either explicitly set the search_path to include the crockford extension schema or set the default search_path for your database. For example,

ALTER ROLE grzm SET search_path TO 'crockford';

You can also set search_path per role:

ALTER DATABASE crockford_test SET search_path to 'crockford';

The fine PostgreSQL manual includes more discussion of schema usage.

Thanks!

The pgcrockford extension takes inspiration from Peter Eisentraut's pguint library, both for using unisigned integers as the underlying implementation and also for generating some of the code that implements it (see generate.py).

I also found Manuel Kniep's series (parts 1, 2, and 3, 4, and 5) on writing PostgreSQL Extensions useful as well--and not only because it focused on writing pg-base36.

Licence

© 2019 Michael Glaesemann

Released under the PostgreSQL License. See LICENSE file for details.