-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support writing string data (TLeafC) to TTrees #516
Comments
NotImplementedError
With #517, we'll at least get a NotImplementedError message. You saw the exception because just checks to see if the object is from the (which is what's done elsewhere in that file). Unfortunately, you still can't write it because I haven't implemented strings (TLeafC), just the numeric types. The "categoricalness" (fact that each unique string is stored only once and referred to by integers; a.k.a "dictionary encoding") would not be preserved in any case because I don't think ROOT I/O has a type like that—definitely not one of the basic TLeaf types. But the data are compressed (if you don't opt out), and a compression algorithm effectively does dictionary encoding. At least, though, PR #517 has Uproot return an error message like this:
which would have been clearer than what you got. I'll leave this issue open, since it's a request for writing string data. In the meantime, you might want to use np.unique with |
NotImplementedError
Thanks for the quick reaction and for the tipps how to workaround this.
Another reason I personally had used Categorical columns is because the categories are ordered and I currently use this metadata in my plotting/histogramming functions. I have a MC-category in a categorical column and then my histogramming/plotting functions use the ordered categories to define the MC-category-axis of a 2D histogram and in the end this also decides the order in which the components of my stacked histogram are plotted. But I'm sure I could achieve this otherwise with string columns or as you say I could just use an integer encoding. |
Hi! #940 added support for writing string data (TLeafC). Your use case is now working as expected:
|
Writing a dataframe containing a
Categorical
column to a ROOT TTree raises anAttributeError
(see MWE below). I guess this is not supported. I don't know if ROOT TTrees even have an equivalent datatype. Of course I can still as a user convert the columns into integer or something and then write the resulting dataframe.I saw that with a string object columns, we get:
With this issue I ask to maybe raise a more helpful
NotImplementedError
for Categorical axes. It would also be ideal to support writing those datatypes, though I assume that's not trivial and probably would also require implementing string axes etc.While playing around with examples for this issue, I also tried the experimental
StringArray
datatype (dtype="string"
) and is also raises the sameAttributeError
instead of aNotImplementedError
, so maybe I can include that in this issue as well.Here is my MWE to reproduce the error when writing TTrees from Categorical columns:
which results in the traceback:
The text was updated successfully, but these errors were encountered: