Feature Selection by FLAML? #258

knoam · 2021-10-19T21:02:38Z

Could you also use FLAML to select an optimal subset of features, perhaps using fewer features at first, then increasing, similar to how model complexity increases during training?

qingyun-wu · 2021-10-20T00:31:19Z

Hi @knoam, thank you for your question and that's an interesting idea. Presumably, you can do so by creating a customized learner: 1. add the number of features to use as a hyperparameter in the search space; 2. before the actual training, first do feature selection according to the number of features suggested by FLAML to get the feature used for training. One underlying assumption of this approach is that your features are ordered by importance or the order does not matter that much such that we can make a decision based on the number of features. Let me know what do you think!

Thank you!

jw00000 · 2021-10-20T16:35:29Z

From what I gleaned from playing with autosklearn, their approach was to build their search space to search for the best 'pipeline' where a pipeline included some preprocessing steps as well as the estimator. So included in the search space were hyperparameters defining choices about which preprocessing components to use and with what hyperparameters. Among the choices of preprocessing steps were many sklearn transformers including feature selection transformers. As a result the search space is rather large I think. I'm curious what you think about this approach. Would it make the search space too large to be practical? Do you think it would improve the quality of the models?

qingyun-wu · 2021-10-21T01:06:00Z

Hi @jw00000, thank you for sharing your experience with autosklearn and suggestions. Including the preprocessing component into the search space (as a hyperparameter with categorical choices) is a very reasonable approach, especially when the number of preprocessing choices is not that large, e.,g., it should be still practical when the number is less than 5. Regarding the impact on model quality: if the time/resource budget is abundant, presumably the model quality won't become worse, although in the case where the time/resource budget is small, the quality of the resulting model may be degraded. We haven't tried this yet. Do you want to give it a try? We'd like to know how it works if so. Thank you!

sonichi · 2021-12-27T05:40:41Z

@knoam what is a metric you'd like to optimize when doing feature selection? With what constraint?

martins0n mentioned this issue Jul 25, 2022

AutoML solutions review tinkoff-ai/etna#804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Selection by FLAML? #258

Feature Selection by FLAML? #258

knoam commented Oct 19, 2021

qingyun-wu commented Oct 20, 2021

Uh oh!

jw00000 commented Oct 20, 2021

Uh oh!

qingyun-wu commented Oct 21, 2021

Uh oh!

sonichi commented Dec 27, 2021

Uh oh!

Feature Selection by FLAML? #258

Feature Selection by FLAML? #258

Comments

knoam commented Oct 19, 2021

qingyun-wu commented Oct 20, 2021

Uh oh!

jw00000 commented Oct 20, 2021

Uh oh!

qingyun-wu commented Oct 21, 2021

Uh oh!

sonichi commented Dec 27, 2021

Uh oh!