CSVY for Python

CSV is a popular format for storing tabular data used in many disciplines. Metadata concerning the contents of the file is often included in the header, but it rarely follows a format that is machine readable - sometimes is not even human readable! In some cases, such information is provided in a separate file, which is not ideal as it is easy for data and metadata to get separated.

CSVY is a small Python package to handle CSV files in which the metadata in the header is formatted in YAML. It supports reading/writing tabular data contained in numpy arrays, pandas DataFrames, polars DataFrames, and nested lists, as well as metadata using a standard python dictionary. Ultimately, it aims to incorporate information about the CSV dialect used and a Table Schema specifying the contents of each column to aid the reading and interpretation of the data.

Installation

'pycsvy' is available in PyPI and conda-forge therefore its installation is as easy as:

pip install pycsvy

or

conda install --channel=conda-forge pycsvy

In order to support reading into numpy arrays, pandas DataFrames or polars DataFrames, you will need to install those packages, too. This can be support by specifying extras, ie:

pip install pycsvy[pandas, polars]

Usage

In the simplest case, to save some data contained in data and some metadata contained in a metadata dictionary into a CSVY file important_data.csv (the extension is not relevant), just do the following:

import csvy

csvy.write("important_data.csv", data, metadata)

The resulting file will have the YAML-formatted header in between --- markers with, optionally, a comment character starting each header line. It could look something like the following:

---
name: my-dataset
title: Example file of csvy
description: Show a csvy sample file.
encoding: utf-8
schema:
  fields:
  - name: Date
    type: object
  - name: WTI
    type: number
---
Date,WTI
1986-01-02,25.56
1986-01-03,26.00
1986-01-06,26.53
1986-01-07,25.85
1986-01-08,25.87

For reading the information back:

import csvy

# To read into a numpy array
data, metadata = csvy.read_to_array("important_data.csv")

# To read into a pandas DataFrame
data, metadata = csvy.read_to_dataframe("important_data.csv")

# To read into a polars LazyFrame
data, metadata = csvy.read_to_polars("important_data.csv")

# To read into a polars DataFrame
data, metadata = csvy.read_to_polars("important_data.csv", eager=True)

The appropriate writer/reader will be selected based on the type of data:

numpy array: np.savetxt and np.loadtxt
pandas DataFrame: pd.DataFrame.to_csv and pd.read_csv
polars DataFrame/LazyFrame: pl.DataFrame.write_csv and pl.scan_csv
nested lists:' csv.writer and csv.reader

Options can be passed to the tabular data writer/reader by setting the csv_options dictionary. Likewise you can set the yaml_options dictionary with whatever options you want to pass to yaml.safe_load and yaml.safe_dump functions, reading/writing the YAML-formatted header, respectively.

You can also instruct a writer to use line buffering, instead of the usual chunk buffering.

Finally, you can control the character(s) used to indicate comments by setting the comment keyword when writing a file. By default, there is no character (""). During reading, the comment character is found automatically.

Note that, by default, these reader functions will assume UTF-8 encoding. You can choose a different character encoding by setting the encoding keyword argument to any of these reader or writer functions. For example, on Windows, Windows-1252 encoding is often used, which can be specified via encoding='cp1252'.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Diego Alonso Álvarez}
🚇 🤔 🚧

_{Alex Dewar}
🤔

_{Adrian D'Alessandro}
🐛 💻 📖

_{James Paul Turner}
🚇 💻

_{Dan Cummins}
🚇 💻

_mikeheyns
🚇

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 388 Commits
.github		.github
.vscode		.vscode
csvy		csvy
docs		docs
tests		tests
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSVY for Python

Installation

Usage

Contributors ✨

About

Releases 2

Packages

Contributors 10

Languages

License

ImperialCollegeLondon/pycsvy

Folders and files

Latest commit

History

Repository files navigation

CSVY for Python

Installation

Usage

Contributors ✨

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 10

Languages

Packages