Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Hdf5 io #4

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open

[WIP] Hdf5 io #4

wants to merge 51 commits into from

Conversation

alexey0308
Copy link
Collaborator

@alexey0308 alexey0308 commented Aug 2, 2021

@simon-anders asked to provide an HDF5 interface to the matrices, saved in the middle steps.

In this PR I'm updating the io to use hdf5 instead of npz.
The current way is to save all in a single file, where hdf5 groups correspond to the chromosomes.
I do not change the function interfaces for now, i.e. it still uses a directory as input, hence the output file name
is hardcoded.

@simon-anders @LKremer please comment here in case you have alternative better ideas, since it was discussed only between me and Simon so far.

  • ADD ExitStack in the dump COO function to avoid unclosed files in case of exception.
  • ADD Calculate nnz element number during dump coo function and return it
  • ADD streamed saving and reading for large data sets:
    A duck-typed object is used to represent the sparse matrix from HDF5 file.
    prepare function got an additional argument to choose between in memory or streamed transformation COO->CSR in HDF5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants