Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: A suite of generally useful functions #182

Open
eric-j-ason opened this issue Nov 14, 2023 · 8 comments
Open

Proposal: A suite of generally useful functions #182

eric-j-ason opened this issue Nov 14, 2023 · 8 comments

Comments

@eric-j-ason
Copy link

eric-j-ason commented Nov 14, 2023

I propose here a suite of functions that I found myself reaching for, and missing in the standard library. I present them in ucode, as that's how I use them; possibly, they should be implemented in C instead.

The tests at the end demonstrates what the functions do.

General

General-purpose functions, most of them for analyzing or manipulating arrays. A source of inspiration is Python's itertools. However, as ucode does not have iterators with lazy evaluation (right?), some of the itertools functions do not make sense in ucode, and some other might be less efficient.

// Identity function. Returns its argument.
export let id = x => x;

// Returns `true` if all elements in `arr` are equal, otherwise returns `false`.
export function uniform(arr) {
	for (let elem in arr) {
		// One might want to make the following comparison using the `equal` function, but that results in a
		// mutual recursion, which doesn't seem to be allowed.
		if (elem != arr[0]) {
			return false;
		}
	}

	return true;
};

// Fold from the left
export function foldl(func, initial, arr) {
	assert(type(func) == "function");
	assert(type(arr) == "array");

	let acc = initial;
	for (let elem in arr) {
		acc = func(acc, elem);
	}

	return acc;
};

// Fold from the right
export let foldr = (func, initial, arr) => foldl((a, b) => func(b, a), initial, reverse(arr));

// Convert to boolean
export function bool(x) {
	return x ? true : false;
};

// Returns `true` if all elements in `arr` are truish, otherwise returns `false`.
export function all(arr) {
	assert(type(arr) == "array");

	return bool(foldl((a, b) => a && b, true, arr));
};

// Returns `true` if at least one element in `arr` is truish, otherwise returns `false`.
export function any(arr) {
	assert(type(arr) == "array");

	return bool(foldl((a, b) => a || b, false, arr));
};

// Takes arrays as arguments and returns their concatenation.
export function concat(...arrs) {
	assert(all(map(arrs, x => type(x) == "array")));

	let c = [];
	for (let arr in arrs) {
		for (let e in arr) {
			push(c, e);
		}
	}

	return c;
};

// Takes arrays as arguments and returns their Cartesian product.
export function cartesian(...arrs) {
	assert(all(map(arrs, x => type(x) == "array")));

	return foldl(
		(acc, set) => concat(...map(acc, a => map(set, s => [...a, s]))),
		[[]], // Singleton set of empty tuple
		arrs
	);
};

// Returns the beginning of an array, for as long as its elements satisfy the predicate.
export function takewhile(pred, arr) {
	assert(type(arr) == "array");

	let count = 0;
	for (let e in arr) {
		if (!pred(e)) {
			break;
		} else {
			count++;
		}
	}

	return slice(arr, null, count);;
};

// Returns the end of an array, beginning with the first element that does notsatisfy the predicate.
export function dropwhile(pred, arr) {
	assert(type(arr) == "array");

	let count = 0;
	for (let e in arr) {
		if (pred(e)) {
			count++;
		} else {
			break;
		}
	}

	return slice(arr, count, null);
};

// Takes any number of functions as arguments and returns their composition. The leftmost argument becomes the inner
// function in the composition, and the rightmost argument becomes the outer function. That way, the sequence of
// functions given to `compose` can be seen as a chain, through which a value can flow from left to right.
export let compose = (...funcs) => foldl((f, g) => x => g(f(x)), id, funcs);

// Akin to matrix transposition. Converts an array of n arrays, each containing m elements, to the corresponding array
// of m arrays, each containing n elements.
export function transpose(arrs) {
	assert(type(arrs) == "array");
	assert(all(map(arrs, x => type(x) == "array")));
	assert(uniform(map(arrs, length)));

	let n = length(arrs);
	let m = length(arrs[0]);

	let z = [];
	for (let j = 0; j < m; j++) {
		z[j] = [];
		for (let i = 0; i < n; i++) {
			z[j][i] = arrs[i][j];
		}
	}

	return z;
};

// Like `transpose`, but takes the n arrays as separate arguments, instead of as elements of one large array.
export let zip = (...arrs) => transpose(arrs);

// Like `map`, but for functions taking more than one argument. The argument list `...arrs` should consist of as many
// arrays as `func` takes arguments
export let map_n = (func, ...arrs) => map(transpose(arrs), x => func(...x));

// Like `map_n`, but the arrays of arguments to `func` are grouped into an array `arrs`.
export let map_arr = (func, arrs) => map_n(func, ...arrs);

// The regular operators `==` and `===` compare arrays and objects by checking wheter they occupy the same place in
// memory. This function, `equal` instead compares the contents (recuresively). Data types other than arrays and objects
// are compared in the regular way.
export function equal(x, y) {
	// If already regular equality obtains, we are done.
	if (x === y) {
		return true;
	}

	let tx = type(x);
	let ty = type(y);

	// If the comparands are arrays, we compare them by their elements.
	if (tx == "array" && ty == "array") {
		return length(x) == length(y) && all(map_n(equal, x, y));
	}

	// If the comparands objects be, by their properties shall they be judged.
	if (tx == "object" && ty == "object") {
		let kx = sort(keys(x));
		let ky = sort(keys(y));

		return equal(kx, ky) && all(map_n(k => equal(x[k], y[k]), kx));
	}

	// By regular equality, `x` and `y` were not equal, and our deep inspection of arrays and objects did not kick
	// in, so we concede that they are indeed not equal.
	return false;
};

// Sum the elements of an array.
export let sum = arr => foldl((a, b) => a + b, 0, arr);

// Multiply the elements of an array.
export let product = arr => foldl((a, b) => a * b, 1, arr);

// Range of numbers
export function range(start, stop) {
	assert(type(start) == "int");
	assert(type(stop) == "int");

	let arr = [];
	for (let i = 0; i < stop - start; i++) {
		arr[i] = start + i;
	}

	return arr;
};

// Returns an array consisting of the element `elem` `n` times.
export function repeat(elem, n) {
	let arr = [];
	for (let i = 0; i < n; i++) {
		arr[i] = elem;
	}

	return arr;
};

// Convert an array of values into an array of two-element arrays, each containing an index and a value.
export function enumerate(arr) {
	assert(type(arr) == "array");

	return zip(range(0, length(arr)), arr);
};

Printing

The pretty printer was written before I learned that there already is one! However, I think I can offer two improvements with my version: it prints [ ] and { } on a single line, and it breaks arrays and objects into multiple lines only when needed to avoid values being too long. Also I'm happy about how short the function could be made (after several rewrites).

// Print with a newline
export function println(...values) {
	print(...values, "\n");
};

// Return a pretty string representation of `x` by breaking up arrays and objects into multiple lines when their
// contents are too long.
export function pretty(x) {
	const indentation = "    ";
	const max_length = 30;

	// Add indentation to the beginning of every line in a block of text.
	function indent(text) {
		return join("\n", map(split(text, "\n"), line => indentation + line));
	}

	const y = sprintf("%J", x);
	if (length(y) <= max_length) {
		// The regular string representation is short enough. Return it.
		return y;
	}

	if (length(x)) {
		if (type(x) == "array") {
			const inner = join(",\n", map(x, pretty));
			return "[\n" + indent(inner) + "\n]";
		}

		if (type(x) == "object") {
			const inner = join(",\n", map(keys(x), key => sprintf("%J", key) + ": " + pretty(x[key])));
			return "{\n" + indent(inner) + "\n}";
		}
	}

	// If we get here, we are dealing with a bool, an int, a double, a string, an empty array, an empty object, `null`,
	// a function, or some type with which I am not familiar. Return the regular string representation, regardless of length.
	return y;
};

// Print the pretty representation of `x`.
export function prettyprint(x) {
	print(pretty(x));
};

// Print the pretty representation of `x`, with a newline at the end.
export function prettyprintln(x) {
	println(pretty(x));
};

Time

// Takes in an array of seconds and nanoseconds, like the one returned by `clock`, and returns an ISO 8601 string
// representation of that date. If no argument is given, the current time is used.
export function iso8601(cl) {
	cl ??= clock();

	assert(type(cl) == "array");
	assert(length(cl) == 2);
	assert(type(cl[0]) == "int");
	assert(type(cl[1]) == "int");
	assert(cl[1] >= 0);

	const ts = gmtime(cl[0]);
	return sprintf("%04d-%02d-%02dT%02d:%02d:%02d,%09dZ", ts.year, ts.mon, ts.mday, ts.hour, ts.min, ts.sec, cl[1]);
};

Random

Returns random data in a few useful forms. Gets its randomness from /dev/random.

import {open} from "fs";

// Returns `n_bytes` of random data.
export function rand_bytes(n_bytes) {
	assert(n_bytes >= 0);

	return open("/dev/random").read(n_bytes);
};

// Returns a random natural number less than `n`.
export function rand_int(n) {
	assert(type(n) == "int");
	assert(n > 0);

	const byte_size = 8;

	let n_bytes = 0;
	for (let m = n; m != 0; m /= 2**byte_size) {
		n_bytes++;
	}

	let random;
	while (true) {
		random = hex(hexenc(rand_bytes(n_bytes)));
		if (random < 2**(n_bytes*byte_size)/n*n) {
			break;
		}
	}

	return random % n;
};

// Returns a string of `n_digits` random hexadecimal digits.
export function rand_hex(n_digits) {
	const hex_size = 4;
	const byte_size = 8;
	assert(byte_size % hex_size == 0);

	return substr(hexenc(rand_bytes((n_digits + 1)/(byte_size/hex_size))), 0, n_digits);
};

// Returns a string of `n_digits` random decimal digits.
export function rand_dec(n_digits) {
	assert(type(n_digits) == "int");
	assert(n_digits >= 0);

	let output = "";
	for (let i = 0; i < n_digits; i++) {
		output += rand_int(10);
	}

	return output;
};

// Returns a random element from the given array.
export function rand_choose(arr) {
	return arr[rand_int(length(arr))];
};

Tests

#!/usr/bin/env ucode

import {uniform, foldl, foldr, bool, all, any, equal, concat, cartesian, takewhile, dropwhile, compose, id, transpose, zip, map_n, map_arr, sum, product, range, repeat, enumerate} from "./lib/general.uc";
import {println, pretty, prettyprint, prettyprintln} from "./lib/printing.uc";
import {rand_bytes, rand_int, rand_hex, rand_dec, rand_choose} from "./lib/random.uc";
import {iso8601} from "./lib/time.uc";

// id
assert(id(5) == 5);
assert(id("abc") == "abc");
assert(id(null) == null);
assert(id(id) == id);
assert(equal(id([0, 1, 2]), [0, 1, 2]));
assert(equal(id({a: 5, b: 6}), {a: 5, b: 6}));

// uniform
assert( uniform([5, 5, 5, 5]));
assert(!uniform([5, 5, 4, 5]));
assert(uniform([]));

// foldl
assert(foldl((a, b) => a + b, 0, [4, 5, 6, 7, 8]) == ((((0 + 4) + 5) + 6) + 7) + 8);
assert(foldl((a, b) => a * b, 1, [4, 5, 6, 7, 8]) == ((((1 * 4) * 5) * 6) * 7) * 8);
assert(foldl((a, b) => a ? "(" + a + "," + b + ")" : b, "", [4, 5, 6, 7, 8]) == "((((4,5),6),7),8)");

// foldr
assert(foldr((a, b) => a + b, 0, [4, 5, 6, 7, 8]) == 4 + (5 + (6 + (7 + 8))));
assert(foldr((a, b) => a * b, 1, [4, 5, 6, 7, 8]) == 4 * (5 * (6 * (7 * 8))));
assert(foldr((a, b) => b ? "(" + a + "," + b + ")" : a, "", [4, 5, 6, 7, 8]) == "(4,(5,(6,(7,8))))");

// bool
assert(bool(5) == true);
assert(bool(0) == false);
assert(bool("") == false);
assert(bool([]) == true);
assert(bool(null) == false);

// all
assert( all([true, true, true,  true]));
assert(!all([true, true, false, true]));
assert( all([4, 5, 6, 7]));
assert(!all([4, 5, 0, 7]));
assert(all([]));

// any
assert(!any([false, false, false, false]));
assert( any([0, 0, 6, 0]));
assert(!any([0, 0, 0, 0]));
assert(!any([]));
assert(!any([0, 0.0, NaN, false, "", null]));

// concat
assert(equal(concat([0, 1, 2], ["a", "b", "c"], [0.0, 1.0, 2.0]), [0, 1, 2, "a", "b", "c", 0.0, 1.0, 2.0]));
assert(equal(concat([]), []));
assert(equal(concat(), []));

// cartesian
assert(equal(cartesian(["a", "b", "c"], [0, 1, 2]), [["a", 0], ["a", 1], ["a", 2], ["b", 0], ["b", 1], ["b", 2], ["c", 0], ["c", 1], ["c", 2]]));
assert(equal(cartesian([1,2], [10,20], [100,200]), [[1,10,100],[1,10,200],[1,20,100],[1,20,200],[2,10,100],[2,10,200],[2,20,100],[2,20,200]]));
assert(equal(cartesian([1,2,3], []), []));
assert(equal(cartesian([], [1,2,3]), []));
assert(equal(cartesian(), [[]]));

// takewhile
assert(equal(takewhile(x => (x <  6), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [3, 4, 5                  ]));
assert(equal(takewhile(x => (x >  6), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [                         ]));
assert(equal(takewhile(x => (x >= 0), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [3, 4, 5, 6, 7, 8, 0, 1, 2]));
assert(equal(takewhile(x => (x <  0), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [                         ]));

// dropwhile
assert(equal(dropwhile(x => (x <  6), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [         6, 7, 8, 0, 1, 2]));
assert(equal(dropwhile(x => (x >  6), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [3, 4, 5, 6, 7, 8, 0, 1, 2]));
assert(equal(dropwhile(x => (x >= 0), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [                         ]));
assert(equal(dropwhile(x => (x <  0), [3, 4, 5, 6, 7, 8, 0, 1, 2]), [3, 4, 5, 6, 7, 8, 0, 1, 2]));

// compose
{
	let f = compose(b64enc, b64dec, hexenc, hexdec);
	assert(f("abcdefghijklm") == "abcdefghijklm");
}
{
	let g = compose(x => map(x, type), sort, uniq, uc);
	assert(
		g([3, 3.0, "", "asdf", id, [0, 1, 2], {}, {a: 5}]) ==
			'[ "ARRAY", "DOUBLE", "FUNCTION", "INT", "OBJECT", "STRING" ]'
	);
}

// transpose
assert(equal(transpose([[1, 2, 3], [11, 22, 33]]), [[1, 11], [2, 22], [3, 33 ]]));

// zip
assert(equal(zip([1, 2], [11, 22], [111, 222]), [[1, 11, 111], [2, 22, 222]]));

// map_n
assert(equal(map_n(index, ["cow", "milk", "butter"], ["w", "i", "a"]), [2, 1, -1]));

// map_arr
assert(equal(map_arr(index, [["cow", "milk", "butter"], ["w", "i", "a"]]), [2, 1, -1]));

// equal
assert( equal([4, 5, 6, 7, 8], [4, 5, 6, 7, 8]));
assert(!equal([4, 5, 6, 7, 8], [4, 5, 6, 1, 8]));
assert(!equal([4, 5, 6, 7, 8], [4, 5, 6, 7   ]));
assert(!equal(0, null));
assert(!equal(0, ""));
assert(!equal(0, "0"));
assert(!equal(0, false));
assert(!equal(1, true));
assert(!equal(null, ""));
assert(!equal(null, "0"));
assert(!equal("0", ""));
assert(equal([], []));
assert(equal([[]], [[]]));
assert(equal([[], 5], [[], 5]));
assert( equal({a: 5, b: "asdf"}, {a: 5,      b: "asdf"        }));
assert( equal({a: 5, b: "asdf"}, {b: "asdf", a: 5             }));
assert(!equal({a: 5, b: "asdf"}, {b: "asdf", a: 6             }));
{
	let x = {
		a: true,
		b: false,
		c: 0,
		d: 1,
		e: 0.0,
		f: 1.0,
		g: "asdf",
		h: "",
		i: [],
		j: [1, 2, 3],
		k: [0, [], [[3, false], "aaa"]],
		l: {foo: true, "bar": 123},
		m: null,
		n: id,
	};
	let y = {
		a: true,
		b: false,
		c: 0,
		d: 1,
		e: 0.0,
		f: 1.0,
		g: "asdf",
		h: "",
		i: [],
		j: [1, 2, 3],
		k: [0, [], [[3, false], "aaa"]],
		l: {foo: true, "bar": 123},
		m: null,
		n: id,
	};

	assert(equal(x, y));
}

// sum
assert(sum([0, 0, 0, 0, 0]) == 0);
assert(sum([4, 5, 6, 7, 8]) == 4 + 5 + 6 + 7 + 8);
assert(sum([]) == 0);

// product
assert(product([1, 1, 1, 1, 1]) == 1);
assert(product([4, 5, 6, 7, 8]) == 4 * 5 * 6 * 7 * 8);
assert(product([]) == 1);
assert(product([1, 2, 3, 0, 4, 5, 6]) == 0);

// range
assert(equal(range(3, 9), [3, 4, 5, 6, 7, 8]));

// repeat
assert(equal(repeat(7, 3), [7, 7, 7]));
assert(equal(repeat(7, 0), []));

// enumerate
assert(equal(enumerate(["x", "y", "z"]), [[0, "x"], [1, "y"], [2, "z"]]));

// iso8601
assert(iso8601([0, 0]) == "1970-01-01T00:00:00,000000000Z");
assert(iso8601([0, 1]) == "1970-01-01T00:00:00,000000001Z");
assert(iso8601([1, 0]) == "1970-01-01T00:00:01,000000000Z");

// println
// Prints to stdout. Not tested.

// pretty
for (let e in [true, false, -5, 0, 5, -5.2, 0.0, 5.2, id, enumerate]) {
	assert(pretty(e) == sprintf("%J", e), e);
}
{
	let obj = {a: 5, b: [], c: [[4,5,6,7,8,99], {x: "asdf", y: null}], d: {a: true, b: false, c: 0, d: {}, e: 0.0, f: 1.0, g: "a\nsdf", h: "", "i:i": [], asdf: print, j: [1, 2, 3], k: [0, [], [[3, false], "aaa"]], liten: {foo: true, "bar": 123}, m: null, n: id}};
	let pretty_string = '{
    "a": 5,
    "b": [ ],
    "c": [
        [ 4, 5, 6, 7, 8, 99 ],
        { "x": "asdf", "y": null }
    ],
    "d": {
        "a": true,
        "b": false,
        "c": 0,
        "d": { },
        "e": 0.0,
        "f": 1.0,
        "g": "a\\nsdf",
        "h": "",
        "i:i": [ ],
        "asdf": "function print(...) { [native code] }",
        "j": [ 1, 2, 3 ],
        "k": [
            0,
            [ ],
            [ [ 3, false ], "aaa" ]
        ],
        "liten": { "foo": true, "bar": 123 },
        "m": null,
        "n": "(x) => { ... }"
    }
}';
	assert(pretty(obj) == pretty_string);
}

// prettyprint
// Prints to stdout. Not tested.

// prettyprintln
// Prints to stdout. Not tested.

// rand_bytes
// Non-deterministic. Not tested.

// rand_int
// Non-deterministic. Not tested.

// rand_hex
// Non-deterministic. Not tested.

// rand_dec
// Non-deterministic. Not tested.

// rand_choose
// Non-deterministic. Not tested.
@jow-
Copy link
Owner

jow- commented Dec 7, 2023

(I thought I replied to it but apparently I never submitted... oops)

I like this idea and pondered for a while to start a kind of "standard library" with useful functionality implemented in ucode. Question is how to package it (ucode-stdlib?) and how to organize the "classes" within that library, e.g. whether to use python style short names (math, itertools, array, ...) or Java style name hierarchies...

I'm slightly leaning towards the short name approach but would love to hear other opinions.

@eric-j-ason
Copy link
Author

Question is how to package it (ucode-stdlib?) and how to organize the "classes" within that library, e.g. whether to use python style short names (math, itertools, array, ...) or Java style name hierarchies...

It was many thousands of years ago since I wrote Java. Would you remind us of what its name hierarchies are like?

@jow-
Copy link
Owner

jow- commented Dec 8, 2023

Would you remind us of what its name hierarchies are like?

Example from the official documentation: https://docs.oracle.com/javase/7/docs/api/index.html

@eric-j-ason
Copy link
Author

Example from the official documentation: https://docs.oracle.com/javase/7/docs/api/index.html

Got it. So, more levels, I take it. Personally, I don't have a strong opinion one way or the other.

@jow-
Copy link
Owner

jow- commented Dec 8, 2023

Okay, so let's define a number of "classes" based upon your submission:

  • "General" => /usr/share/ucode/array.uc
    • Usage: import { uniform, ... } from 'array';
    • Scope: Everything array & list related
  • "Printing" => /usr/share/ucode/output.uc
    • Usage: import { println, ... } from 'output';
    • Scope: anything related to formatting and output (stdout, stderr, file), could gain further convenience functions such as printf with named placeholders etc.
  • "Time" => /usr/share/ucode/datatime.uc
    • Usage: import { iso8601 } from 'datatime';
    • Scope: time & calendar related functions, could gain functionality to calculate delta between dates, strptime & strftime equivalents etc.
  • "Random" => /usr/share/ucode/random.uc
    • Usage: import { rand_int, ... } from 'random';
    • Scope: various randomness functions

Maybe the "random" and "output" modules are too narrow in scope and should be made into a "numeric" and "data" module to deal with anything numerical (calculations, randomness, ...) and data structure (formatting, transforming, higher level structures) respectively.

@eric-j-ason
Copy link
Author

I added the functions product, prettyprint and prettyprintln.

@eric-j-ason
Copy link
Author

Okay, so let's define a number of "classes" based upon your submission:

Cool!

Some of the functions in "general" are not array related, or only partially so, so if there will be a class called "array", they should probably be outside of it:

  • bool
  • compose
  • equal
  * Usage: `import { iso8601 } from 'datatime';`
  * Scope: time & calendar related functions, could gain functionality to calculate delta between dates, strptime & strftime equivalents etc.

(I assume it was meant to say "datetime".) Yeah, date calculation and manipulation would be useful. It's a mess to deal with, so I'm thinking that it makes sense to call upon some existing library.

Maybe the "random" and "output" modules are too narrow in scope and should be made into a "numeric" and "data" module to deal with anything numerical (calculations, randomness, ...) and data structure (formatting, transforming, higher level structures) respectively.

The "random" category should probably be extended:

  • Allow for instantiating random-number generators with a seed, so that random numbers can be generated reproducibly.
  • Generate floating-point numbers, uniformly distributed on the interval [0, 1].
  • Choose elements randomly from an array, according to provided weights.
  • Possibly too far into the territory of a specialized numerical library: Draw from various common (discrete and continuous) probability distributions.

@eric-j-ason
Copy link
Author

Updated the equal function to use strict equality (===). It's probably good to default to strict equality. It might come as an unfortunate surprise for a user to have 0 equal "" and similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants