Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing --raw-output jq flag to enable json -> csv conversion to be "pushed down" to jq and enabling jqr to picking up nonjson results #84

Open
mskyttner opened this issue Dec 7, 2020 · 4 comments

Comments

@mskyttner
Copy link

mskyttner commented Dec 7, 2020

This is an issue or perhaps feature request related to having jqr support raw output from jq, with non-json return type. The use case is to use jqr to "push down" some queries/work to jq which would benefit from jqr supporting the "--raw-output" option when returning for example csv data.

An example of such a use case is working with a large json(ish) file, getting only some elements, converting those to csv (with all these steps pushed down to jq) and then from jqr picking up these non-json raw results.

At bash it would be a command similar to this one: cat jsons | jq -r '[.id, .orcid] | @csv', and here is an illustration from R with some example data:

library(jqr)
library(readr)

# data is not valid (nd)json, it is what it is..., and looks like this:
jsons <- 
  '{
    "id": "u1003nxf",
    "orcid": "",
    "profile": {
        "firstName": "John",
        "lastName": "Doe"
    }
}
{
    "id": "u1002cfh",
    "orcid": "",
    "profile": {
        "firstName": "Jo",
        "lastName": "Doe"
    }
}
'

# attempting to use jqr, the "--raw-output" is not amongst the jq_flags()
# so the output always gets returned as if it was json, with quotes, ...
# which is a bit cumbersome to dejsonify after this has happened...

readr::read_lines(jsons) %>% 
  jqr::jq("[.id, .orcid] | @csv") %>%
  as.character()
# [1] "\"\\\"u1003nxf\\\",\\\"\\\"\"" "\"\\\"u1002cfh\\\",\\\"\\\"\""

# this CLI invocation of jq works for converting jsons to csv
# but it bypasses jqr completely
writeLines(jsons, "jsons")
csv <- system("cat jsons | jq -r '[.id, .orcid] | @csv'", intern = TRUE)
readr::read_csv(csv, col_names = c("id", ".orcid"))

# A tibble: 2 x 2
#id       .orcid
#<chr>    <lgl> 
# 1 u1003nxf NA    
# 2 u1002cfh NA

# there doesn't seem to be a straight forward way to do this if using `jqr` currently?

Workarounds or fixes

There seem to be these options supported by jq as noted in a related issue. So currently in jqr, the jq_flags there are more like jv_dump_string_flags() as described here.

So "expected" jq command line options are not currently passed in jqr, since they're hardcoded and set to 0 here unlike when jq runs at the command line.

It looks like there doesn't seem to be a way to pass a "--raw-output" option through jqr right now because of this. Everything gets converted to json first.

If the jq query/program/DSL when processed could get a parameter for the jq options/flags passed in it it could branch out and use jv_string_value for raw outputs instead of passing everything to jv_dump_string. Now everything goes to jv_dump_string which is what I think causes the "double quoting" of quote characters before these non-json results are returned to jqr?

Related issues

I think this issue relates to these other issues (with the variation that it would like to be able to use jqr to pass the --raw-output flag/option and then expect the return type to not be json in order to support a jq query push down which uses "|@csv"):

@sckott
Copy link
Collaborator

sckott commented Dec 7, 2020

Thanks for the issue!

It's not clear if we can support raw output or not. The flags stuff is a bit beyond my comprehension of jq and our interface with it. I'd like to support this, but probably can't sort this out myself.

Do you know of a proposed fix?

@mskyttner
Copy link
Author

mskyttner commented Dec 7, 2020

I tried to untangle it a little but I'm not so sure about the lowlevel stuff. I think a proposed fix perhaps would involve steps like:

  • Add a parameter (for passing the "raw-output" and other such jq options/flags) here, basically adding code to do more of what is done here in main.c in a similar way, I guess, including dealing with some of those errors. Right now it looks like it is a cleverly simplified version which avoids dealing with the errors and skips the raw output steps/branching.
  • Change the flag passing which bakes some things together right now, and appear to mix together jq command line options with "jv_dump_string flags" at a higher level, and then splitting them out a bit further in, here
  • Somehow make sure these flags for example for "--raw-output" gets passed all way through from R especially in the process step when a program is being run... where it currently gets set to 0.

Not brave enough to do a PR on it though, I'm afraid.

@mskyttner
Copy link
Author

My workaround for now, when using jq with "--raw-output" while converting to CSV is to "shell out" to a 4MB docker image and passing results back to R:

# workaround but depends on docker and a 4 MB docker image with jq
# attempted to use stevedore first, but ran into issues with the command splitter
# and with capturing output from the command

# function to enable running "jq" with --raw-output through docker
docker_cli_runc <- function(slug, command, v_host, v_container) {
  
  stopifnot(file.exists(v_host))
  
  cli_runc <- function(slug, command, v_host, v_container)
    sprintf("docker run --rm -v %s:%s %s %s", 
      v_host, v_container, slug, command)
  
  cmd <- cli_runc(slug, command, v_host, v_container)
  
  system(cmd, intern = TRUE)
  
}

# small example data
jsons <- 
  '{
    "id": "u1003nxf",
    "orcid": "",
    "profile": {
        "firstName": "John",
        "lastName": "Doe"
    }
}
{
    "id": "u1002cfh",
    "orcid": "",
    "profile": {
        "firstName": "Jo",
        "lastName": "Doe"
    }
}
'
# available on hosts disk
readr::write_lines(jsons, "~/temp/jsons")

# test running jq with --raw-output
docker_cli_runc(
  slug = "docker.io/endeveit/docker-jq",
  command = "cat /tmp/jsons | jq -r '[.id, .orcid] | @csv'",
  v_host = "~/temp/jsons",
  v_container = "/tmp/jsons"
)

# poor man's wrapper
jq <- function(file, query) {
  command <- sprintf("cat /tmp/jsons | %s", query)
  docker_cli_runc(
    slug = "docker.io/endeveit/docker-jq",
    command = command,
    v_host = file, v_container = "/tmp/jsons")  
}

# using it on arbitrary json files
library(magrittr)

"~/temp/jsons" %>% 
  jq("jq -r '[.id, .orcid] | @csv'") %>%
  readr::read_csv(col_names = c("id", "orcid"))

@sckott
Copy link
Collaborator

sckott commented Dec 16, 2020

Thanks - i will try to have a look soon, no promises

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants