How to extract only a specific subset of features for real-time prediction? #1032
-
Hi, I'm in a task of timeseries classification, with the peculiarity that I need to implement the prediction part in real-time with the following logic: every 1 second I look at the past 2.5 seconds and classify them as 1 or 0. That gives me at most 1 second to perform every inference. My time-series is 30 timesteps x 63 features. X = extract_features(df.drop('target', axis=1), column_id='id', column_sort=None,
impute_function=impute,
n_jobs=1) to extract a large number of features without any hurry, then I perform a feature selection with SelectFromModel(
estimator=LogisticRegression(
C=0.1,
penalty='l1',
solver='liblinear',
max_iter=1000,
random_state=42
)
) and I'm able to get a very good performance with a To me it looks pointless to extract all the features while I perfectly know which ones I need, they are also different features depending on which of the 63 time-series is being considered. Is there a way to let the extractor know exactly which ones I need? Possibly being flexible without hardcoding everything. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, this should indeed be possible!
You can read more on the things I described here: https://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html |
Beta Was this translation helpful? Give feedback.
Yes, this should indeed be possible!
When extracting features, you can choose the extraction settings that define which feature calculators (using which settings) to use. You can choose one of the predefined settings - but you can definitely also create one particularly for your own needs. We have a convenience function https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html#tsfresh.feature_extraction.settings.from_columns for this. My suggestion would be:
You m…