Skip to content

Feature Selection by FLAML? #258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
knoam opened this issue Oct 19, 2021 · 4 comments
Open

Feature Selection by FLAML? #258

knoam opened this issue Oct 19, 2021 · 4 comments

Comments

@knoam
Copy link
Collaborator

knoam commented Oct 19, 2021

Could you also use FLAML to select an optimal subset of features, perhaps using fewer features at first, then increasing, similar to how model complexity increases during training?

@qingyun-wu
Copy link
Contributor

Hi @knoam, thank you for your question and that's an interesting idea. Presumably, you can do so by creating a customized learner: 1. add the number of features to use as a hyperparameter in the search space; 2. before the actual training, first do feature selection according to the number of features suggested by FLAML to get the feature used for training. One underlying assumption of this approach is that your features are ordered by importance or the order does not matter that much such that we can make a decision based on the number of features. Let me know what do you think!

Thank you!

@jw00000
Copy link
Collaborator

jw00000 commented Oct 20, 2021

From what I gleaned from playing with autosklearn, their approach was to build their search space to search for the best 'pipeline' where a pipeline included some preprocessing steps as well as the estimator. So included in the search space were hyperparameters defining choices about which preprocessing components to use and with what hyperparameters. Among the choices of preprocessing steps were many sklearn transformers including feature selection transformers. As a result the search space is rather large I think. I'm curious what you think about this approach. Would it make the search space too large to be practical? Do you think it would improve the quality of the models?

@qingyun-wu
Copy link
Contributor

Hi @jw00000, thank you for sharing your experience with autosklearn and suggestions. Including the preprocessing component into the search space (as a hyperparameter with categorical choices) is a very reasonable approach, especially when the number of preprocessing choices is not that large, e.,g., it should be still practical when the number is less than 5. Regarding the impact on model quality: if the time/resource budget is abundant, presumably the model quality won't become worse, although in the case where the time/resource budget is small, the quality of the resulting model may be degraded. We haven't tried this yet. Do you want to give it a try? We'd like to know how it works if so. Thank you!

@sonichi
Copy link
Contributor

sonichi commented Dec 27, 2021

@knoam what is a metric you'd like to optimize when doing feature selection? With what constraint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants