Store data in a structure of folders and subfolders, so that data are 'naturally' sorted #335
GitHunter0
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
this is a good idea! I wonder if I should prioritise this above the NSE rewrite. Maybe I should. I am currently working on fixing JDF.jl as my OS commitment, after that I can come back to disk.frame. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi folks, this is a follow up of the discussion I had here with @xiaodaigh : #327 (comment)
This would allow to work with longitudinal/panel data without the need to apply the very expensive 'hard sorting' operation.
The picture below shows a tree example:
That way the id variable (country) and date variable (year) would be already sorted from the start.
That is precisely the strategy adopted by Apache Arrow, as it is demonstrated here https://cran.r-project.org/web/packages/arrow/vignettes/dataset.html
This feature in my opinion is one of the last pieces which would allow a more frequent/universal usage of disk.frame (which is already a phenomenal package)
Beta Was this translation helpful? Give feedback.
All reactions