Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: adding tidytable support #271

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

WIP: adding tidytable support #271

wants to merge 2 commits into from

Conversation

xiaodaigh
Copy link
Collaborator

Adding some tests, it seems that tidytable works quite different to dplyr to the point where

the below doesn't work with {disk.frame} but the dplyr equivalent does. It seems that obtaining some function or variables from the environment is not working

  fn = function(a, b) {
    a+b
  }
  
  df3 = value %>%
    dt_mutate(b =  fn(num, num)) %>%
    collect

This is either a {disk.frame} issue or an implementation detail of tidytable. Need to investigate

@markfairbanks
Copy link

Interesting. Those examples work outside of {disk.frame}, I wonder what's causing the issue. I'll take a look at {tidytable} as well and see if I can find anything

@xiaodaigh
Copy link
Collaborator Author

Actually, the below seems to work, so we probably not that far from an integration! Group-by are a bit trickier, but can be done.

As you can see, integrating many of funcitons like dt_mutate is a simple as calling the create_chunk_mapper method

fn = function(a, b) {
  a+b
}

dt_mutate = disk.frame::create_chunk_mapper(tidytable::dt_mutate)

library(disk.frame)
value = as.disk.frame(data.frame(num = runif(1e6)))
df3 = value %>%
  dt_mutate(b =  fn(num, num)) %>%
  collect

df3

@markfairbanks
Copy link

Nice! Glad it's looking like it will work

@xiaodaigh
Copy link
Collaborator Author

This doesn't work

fn = function(a, b) {
  a+b
}

dt_mutate = disk.frame::create_chunk_mapper(tidytable::dt_mutate)

library(disk.frame)
setup_disk.frame() # this makes it fail
value = as.disk.frame(data.frame(num = runif(1e6)))
df3 = value %>%
  dt_mutate(b =  fn(num, num)) %>%
  collect

df3

@xiaodaigh
Copy link
Collaborator Author

In the latest branch of disk.frame; this works flawlessly

library(tidytable)
library(disk.frame)
setup_disk.frame(2)

fn <- function(a,b) a+b

mutate..disk.frame = create_chunk_mapper(tidytable::mutate.)

value = as.disk.frame(data.frame(num = runif(1e6)))

df3 = value %>%
  mutate.(b =  fn(num, num)) %>% 
  collect

So should be doable soon

@bnicenboim
Copy link

Hi, any updates regarding this branch?

@xiaodaigh
Copy link
Collaborator Author

this is pending updates to the disk.frame NSE mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants