Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window and stride arguments are making it harder to use the package. feature_collection.reduce example #76

Open
arturdaraujo opened this issue Sep 28, 2022 · 2 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@arturdaraujo
Copy link

arturdaraujo commented Sep 28, 2022

First of all, this package is awesome. The community that deals with time series data needed to improve the game and tsflex have everything to be the main library.

However, here are a few specific suggestions:

Remove "windows" and "strides" arguments altogether for feature extraction:
It does seem a bit excessive but hear me out. They are good arguments but not fundamental for feature extraction. They could be used in data preparation, Alteryx has a library called "compose" (https://github.com/alteryx/compose) just for the purpose of creating multiple time frame windows. Once the "window" is ready, just select the functions. I propose tsflex main function (feature_collection.calculate) just use time series data and a list of functions for feature extraction, no window or strides.

Explaining further:
The way I view the implementation of the essentials would be only this: feature_collection.calculate(time_series_df, functions).
If any of the columns of the time series had any data type other than int, float, it could simply raise an error or ignore the column.

Window and stride also make feature_collection.reduce function hard to use:
After feature selection and having selected a few columns of the many created using tsflex I use the reduce that gives me the functions for transformation/extraction. The problem is that the naming convention includes window and strides (e.g: Open__mean__w=233500_s=233500) which means I have to have a time series with the same characteristics/size, which often doesn't happen. I use the arguments windows and strides like the following:

simple_feats = MultipleFeatureDescriptors(
functions=tsfresh_settings_wrapper(settings),
series_names="Open",
windows=len(stock_data) - 1,
strides=len(stock_data) - 1,
)
feature_collection = FeatureCollection(simple_feats)
features_df = feature_collection.calculate(
stock_full, return_df=True, show_progress=True, approve_sparsity=(True)
)

I use this because I need to process the whole dataset.

Anyway, I hope this is helpful.

@arturdaraujo arturdaraujo changed the title Window and stride arguments are making it harder to use the package. feature_collection.reduce doesn't make sense Window and stride arguments are making it harder to use the package. feature_collection.reduce example Sep 28, 2022
@jvdd
Copy link
Member

jvdd commented Sep 28, 2022

Hi, thanks for creating this issue @arturdaraujo! We are always happy to hear feedback from the community 😄

I'll discuss your remarks with @jonasvdd & @emield12 (the 2 other maintainers) and will keep you posted.

Cheers, Jeroen

P.S. In PR #71 I already decoupled the window & stride from the feature descriptors 😉

@jvdd jvdd added enhancement New feature or request question Further information is requested labels Sep 28, 2022
@jvdd
Copy link
Member

jvdd commented Oct 11, 2022

tsflex v0.3 is just released! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants