muutils
, stylized as "$\mu$utils" or "μutils", is a collection of miscellaneous python utilities, meant to be small and with no dependencies outside of standard python.
PyPi: muutils
pip install muutils
Note that for using mlutils
, tensor_utils
, nbutils.configure_notebook
, or the array serialization features of json_serialize
, you will need to install with optional array
dependencies:
pip install muutils[array]
hosted html docs: https://miv.name/muutils
- single-page html docs (absolute source link)
- single-page markdown docs (absolute source link)
- Test coverage: webpage (absolute source link) (plain text)
an extension of collections.Counter
that provides "smart" computation of stats (mean, variance, median, other percentiles) from the counter object without using Counter.elements()
has utilities for working with dictionaries, like:
- converting dotlist-dictionaries to nested dictionaries and back:
>>> dotlist_to_nested_dict({'a.b.c': 1, 'a.b.d': 2, 'a.e': 3}) {'a': {'b': {'c': 1, 'd': 2}, 'e': 3}} >>> nested_dict_to_dotlist({'a': {'b': {'c': 1, 'd': 2}, 'e': 3}}) {'a.b.c': 1, 'a.b.d': 2, 'a.e': 3}
DefaulterDict
which works like adefaultdict
but can generate the default value based on the keycondense_tensor_dict
takes a dict of dotlist-tensors and gives a more human-readable summary:>>> model = MyGPT() >>> print(condense_tensor_dict(model.named_parameters(), 'yaml'))
embed: W_E: (50257, 768) pos_embed: W_pos: (1024, 768) blocks: '[0-11]': attn: '[W_Q, W_K, W_V]': (12, 768, 64) W_O: (12, 64, 768) '[b_Q, b_K, b_V]': (12, 64) b_O: (768,) <...>
Anonymous gettitem, so you can do things like
>>> k = Kappa(lambda x: x**2)
>>> k[2]
4
utility for getting a bunch of system information. useful for logging.
contains a few utilities:
- stable_hash()
uses hashlib.sha256
to compute a hash of an object that is stable across runs of python
- list_join
and list_split
which behave like str.join
and str.split
but for lists
- sanitize_fname
and dict_to_filename
for simplifying the creation of unique filename
- shorten_numerical_to_str()
and str_to_numeric
turns numbers like 123456789
into "123M"
and back
- freeze
, which prevents an object from being modified. Also see gelidum
contains utilities for working with jupyter notebooks, such as:
- quickly converting notebooks to python scripts (and running those scripts) for testing in CI
- configuring notebooks, to make it easier to switch between figure output formats, locations, and more
- shorthand for displaying mermaid diagrams and TeX
a tool for serializing and loading arbitrary python objects into json. plays nicely with ZANJ
contains minor utilities for working with pytorch tensors and numpy arrays, mostly for making type conversions easier
groups elements from a sequence according to a given equivalence relation, without assuming that the equivalence relation obeys the transitive property
an extremely simple utility for reading/writing jsonl
files
is a human-readable and simple format for ML models, datasets, and arbitrary objects. It's build around having a zip file with json
and npy
files, and has been spun off into its own project.
There are a couple work-in-progress utilities in _wip
that aren't ready for anything, but nothing in this repo is suitable for production. Use at your own risk!