function: 'lowest' common type #157

majidaldo · 2020-12-22T17:36:50Z

Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.

A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.

ieaves · 2020-12-23T13:48:49Z

Hey Majid, great observation. Although it’s not exactly what you’re looking for we have a performance enhancement implementation leveraging this fact under ‘visions.type sets.typeset’ called ‘traverse_graph_with_sampled_series’ that you can invoke directly for a quick speed up win. More broadly, if instead of the ‘detect_type’ method you simply use ‘detect’ (and infer counterparts) you can pull the full inference path which consists of a list of nodes from root to final. You can then find the intersections between columns across your discrete data sets to determine a best representation.

…

On Tue, Dec 22 2020 at 12:37 PM, Majid alDosari < ***@***.*** > wrote: Sometimes going through a whole array is not needed. You have subsets of the array and you just want to get a compatible data type for all subsets. A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub ( #157 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AB3MV54GPBNRRD4SHHRGN4TSWDKMDANCNFSM4VF5XCUA ).

ieaves · 2020-12-23T15:09:01Z

I should add, If you were interested in making a PR for this use case it would be more than welcome.

A basic implementation would look something like this:

def cast_along_path(series, graph, path, state={}):
  base_type = path[0]
  for vision_type in path[1:]:
      relation = graph[base_type][vision_type]["relationship"]
      series = relation.transform(series, state)
  return series

Which could be invoked

T = typeset
s = pd.Series([your data])
path = [Generic, Object, String]

# Type Detection
new_s = cast_along_path(s, T.base_graph, path)

# Type Inference
new_s = cast_along_path(s, T.relation_graph, path)

majidaldo added the enhancement New feature or request label Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function: 'lowest' common type #157

function: 'lowest' common type #157

majidaldo commented Dec 22, 2020 •

edited

Loading

ieaves commented Dec 23, 2020 via email

ieaves commented Dec 23, 2020 •

edited

Loading

function: 'lowest' common type #157

function: 'lowest' common type #157

Comments

majidaldo commented Dec 22, 2020 • edited Loading

ieaves commented Dec 23, 2020 via email

ieaves commented Dec 23, 2020 • edited Loading

majidaldo commented Dec 22, 2020 •

edited

Loading

ieaves commented Dec 23, 2020 •

edited

Loading