Table.append() improvement using some sort of resolution strategy for different column types for the same column name #1169

monil1334 · 2022-11-15T20:03:14Z

monil1334
Nov 15, 2022

Currently, I am trying to parse CSV files from s3 and TableSaw is interpreting different CSV files with the same columns as the different column types. Eg as follows:

Column 'cxlToPlcSell' has type INTEGER, but column 'cxlToPlcSell' has type DOUBLE.

Can we add a resolution strategy where it tries to append the values to the table based on either table1's column type or table2's column type..or add a callback and let the user decide what to do if the same column is being read differently in different reads?

My CSV files contain 440 columns but they are treated differently based on how the data is present in those files. I cannot create a map of column name to column type. It is quite tedious and exhaustive.

lwhite1 · 2022-11-15T21:23:57Z

lwhite1
Nov 15, 2022
Maintainer

That’s a good idea. You could also limit the column types to a narrow set, and use double for all the numbers.

…

On Tue, Nov 15, 2022 at 3:03 PM monil1334 ***@***.***> wrote: Currently, I am trying to parse CSV files from s3 and TableSaw is interpreting different CSV files with the same columns as the different column types. Eg as follows: Column 'cxlToPlcSell' has type INTEGER, but column 'cxlToPlcSell' has type DOUBLE. Can we add a resolution strategy where it tries to append the values to the table based on either table1's column type or table2's column type..or add a callback and let the user decide what to do if the same column is being read differently in different reads? My CSV files contain 440 columns but they are treated differently based on how the data is present in those files. I cannot create a map of column name to column type. It is quite tedious and exhaustive. — Reply to this email directly, view it on GitHub <#1169>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2FPAVIFXQALEQVYVMMXQLWIPUA5ANCNFSM6AAAAAASBJ4CVY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

monil1334 Nov 15, 2022
Author

Is there a config option to treat all numbers as ColumnType.Double? and is there a regex matcher that I can use on the col names so I can easily limit those columns to ColumnType.Text

lwhite1 · 2022-11-15T23:39:40Z

lwhite1
Nov 15, 2022
Maintainer

You could write your own matcher You can also provide a list of the types to use so if you say just text, that’s all that will be used. If you say string and double, the numeric columns will be identified as double. This is a config option You could also write something to concatenate the files before loading

…

On Tue, Nov 15, 2022 at 4:26 PM monil1334 ***@***.***> wrote: Is there a config option to treat all numbers as ColumnType.Double? and is there a regex matcher that I can use on the col names so I can easily limit those columns to ColumnType.Text — Reply to this email directly, view it on GitHub <#1169 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2FPATEVUGXO6TETGMV2ALWIP52DANCNFSM6AAAAAASBJ4CVY> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table.append() improvement using some sort of resolution strategy for different column types for the same column name #1169

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Table.append() improvement using some sort of resolution strategy for different column types for the same column name #1169

monil1334 Nov 15, 2022

Replies: 2 comments · 1 reply

lwhite1 Nov 15, 2022 Maintainer

monil1334 Nov 15, 2022 Author

lwhite1 Nov 15, 2022 Maintainer

monil1334
Nov 15, 2022

Replies: 2 comments 1 reply

lwhite1
Nov 15, 2022
Maintainer

monil1334 Nov 15, 2022
Author

lwhite1
Nov 15, 2022
Maintainer