Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] get_feature_names_out for sklego.preprocessing. #544

Closed
wants to merge 47 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
22da776
Pass along arbitrary parameters to fit `EstimatorTransformer`
CarloLepelaars Sep 12, 2022
955a0ee
Remove *args option from `EstimatorTransformer.fit()`
CarloLepelaars Sep 12, 2022
8d93361
Setup test for passing additional arguments in `EstimatorTransformer.…
CarloLepelaars Sep 12, 2022
082ce5f
Test if `EstimatorTransformer` fit+transform is the same with sample_…
CarloLepelaars Sep 12, 2022
237edb4
`EstimatorTransformer` test_kwargs comments
CarloLepelaars Sep 12, 2022
3eff767
Use array to test passing of `sample_weight` in `EstimatorTransformer`
CarloLepelaars Sep 14, 2022
8ef9f70
Use more simple `LinearRegression` in `test_kwargs`
CarloLepelaars Sep 14, 2022
ae7b061
Update tests/test_meta/test_estimatortransformer.py
CarloLepelaars Sep 19, 2022
f06a7af
Update tests/test_meta/test_estimatortransformer.py
CarloLepelaars Sep 19, 2022
b0ca1a0
Use unittest.Mock to check if fit method works with added kwargs
CarloLepelaars Sep 19, 2022
15e25fc
Merge branch 'main' into main
CarloLepelaars Sep 26, 2022
c471311
Working solution to test `EstimatorTransformer.fit` with added kwargs
CarloLepelaars Sep 26, 2022
9af3a6d
Fix Python3.7 issue with `Mock().call_args` for non-keyword args.
CarloLepelaars Sep 26, 2022
50b4b06
Simplify `test_kwargs` so passing of `kwargs` is tested.
CarloLepelaars Sep 26, 2022
2102583
Remove redundant whitespace at bottom of tests file
CarloLepelaars Sep 26, 2022
c7df2aa
Fix Python3.7 issue for `Mock().call_args`
CarloLepelaars Sep 27, 2022
aa526aa
Merge branch 'koaning:main' into main
CarloLepelaars Sep 27, 2022
999c197
PoC for `get_feature_names_out` for `EstimatorTransformer`
CarloLepelaars Sep 27, 2022
78016af
Refine `get_feature_names_out` for `EstimatorTransformer`. Tests for …
CarloLepelaars Sep 27, 2022
ea9627a
Custom `check_is_fitted` requirements.
CarloLepelaars Sep 27, 2022
4325348
Remove redundant imports
CarloLepelaars Sep 27, 2022
7404a7f
Remove redundant check in `__sklearn_.is_fitted`
CarloLepelaars Sep 27, 2022
7978e1c
Clean up tests for `EstimatorTransformer`
CarloLepelaars Sep 28, 2022
b9706fd
Merge branch 'main' into feature/meta-feature-names-out
CarloLepelaars Oct 6, 2022
e8f1d19
New lines in docstrings
CarloLepelaars Oct 6, 2022
f6cc211
Merge branch 'koaning:main' into main
CarloLepelaars Oct 10, 2022
6bf9abb
`get_feature_names_out`+test for `ColumnCapper`
CarloLepelaars Oct 10, 2022
e471b6c
`get_feature_names_out` implementations for `ColumnCapper` and `DictM…
CarloLepelaars Oct 10, 2022
5aa735f
ValueError check for `get_feature_names_out` call without input_featu…
CarloLepelaars Oct 10, 2022
410f98a
`get_feature_names_out` for `IdentityTransformer` and test simplifica…
CarloLepelaars Oct 10, 2022
7e869fe
`get_feature_names_out` for `IntervalEncoder` and clean up `test_inte…
CarloLepelaars Oct 11, 2022
6a6cde2
`get_feature_names_out` for `ColumnDropper`
CarloLepelaars Oct 11, 2022
22e26b0
`get_feature_names_out` for `ColumnSelector`
CarloLepelaars Oct 11, 2022
d400f11
`get_feature_names_out` for `PandasTypeSelector`
CarloLepelaars Oct 11, 2022
6a65b3b
`get_feature_names_out` for `PatsyTransformer` and clean up of `test_…
CarloLepelaars Oct 11, 2022
fc29765
`get_feature_names_out` for `InformationFilter`, `OrthogonalTransform…
CarloLepelaars Oct 11, 2022
97317f1
Simplify `get_feature_names_out` for `ColumnCapper` and `DictMapper`
CarloLepelaars Oct 11, 2022
5bed904
Merge commit 'e8f1d19e' into feature/preprocessing-feature-names-out
CarloLepelaars Oct 11, 2022
369969b
Revert "Merge commit 'e8f1d19e' into feature/preprocessing-feature-na…
CarloLepelaars Oct 11, 2022
fcb3058
Simplify `get_feature_names_out` for `IntervalEncoder` and contributi…
CarloLepelaars Oct 11, 2022
54fecb2
Finetune contribution guidelines for new preprocessors
CarloLepelaars Oct 11, 2022
6c42050
Bump minimum Python version from 3.7 to 3.8 in Github Actions pipelines.
CarloLepelaars Oct 12, 2022
9e52aae
Bump build to Python 3.8 in `.gitpod.yml`
CarloLepelaars Oct 12, 2022
d46ae69
General test to check if `get_feature_names_out` is implemented for a…
CarloLepelaars Oct 12, 2022
96b39b3
Put back commented checks in `test_interval_encoder`
CarloLepelaars Oct 13, 2022
5a6d2b8
Link to contribution docs in readme
CarloLepelaars Oct 13, 2022
328af9a
Use sklearn checks to check get_feature_names_out for `sklego.preproc…
CarloLepelaars Oct 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
get_feature_names_out for IdentityTransformer and test simplifica…
…tion.
  • Loading branch information
CarloLepelaars committed Oct 10, 2022
commit 410f98a93a616bd547e1e393af89c770a521e90b
5 changes: 3 additions & 2 deletions sklego/preprocessing/identitytransformer.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.base import BaseEstimator, TransformerMixin, _OneToOneFeatureMixin

from sklearn.utils import check_array
from sklearn.utils.validation import check_is_fitted


class IdentityTransformer(BaseEstimator, TransformerMixin):
class IdentityTransformer(BaseEstimator, TransformerMixin, _OneToOneFeatureMixin):
"""
The identity transformer returns what it is fed. Does not apply anything useful.
The reason for having it is because you can build more expressive pipelines.
Expand All @@ -26,6 +26,7 @@ def fit(self, X, y=None):
if self.check_X:
X = check_array(X, copy=True, estimator=self)
self.shape_ = X.shape
self.n_features_in_ = X.shape[1]
return self

def transform(self, X):
Expand Down
2 changes: 1 addition & 1 deletion tests/test_preprocessing/test_columncapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def test_get_feature_names_out(random_xy_dataset_clf):
expected_feature_names = [f"feature_{i}" for i in n_col_range]
np.testing.assert_array_equal(feature_names, expected_feature_names)

# Make sure get_feature_names_out cannot be called without given input_features if ColumnCapper is not fitted.
# get_feature_names_out should not work without given input_features if ColumnCapper is not fitted.
with pytest.raises(ValueError):
cc.get_feature_names_out(input_features=None)

Expand Down
2 changes: 1 addition & 1 deletion tests/test_preprocessing/test_dictmapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def test_get_feature_names_out(random_xy_dataset_clf):
expected_feature_names = ['foobar_feature']
np.testing.assert_array_equal(feature_names, expected_feature_names)

# Make sure get_feature_names_out cannot be called without given input_features if DictMapper is not fitted.
# get_feature_names_out should not work without given input_features if DictMapper is not fitted.
with pytest.raises(ValueError):
dm.get_feature_names_out(input_features=None)

Expand Down
15 changes: 15 additions & 0 deletions tests/test_preprocessing/test_identitytransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,18 @@ def test_nan_inf(random_xy_dataset_regr):
X[np.random.ranf(size=X.shape) > 0.9] = -np.inf
X[np.random.ranf(size=X.shape) > 0.9] = np.inf
X_new = IdentityTransformer(check_X=False).fit_transform(X)


def test_get_feature_names_in(random_xy_dataset_regr):
X, y = random_xy_dataset_regr
it = IdentityTransformer()

# get_feature_names_out should not work without given input_features if IdentityTransformer is not fitted.
with pytest.raises(ValueError):
it.get_feature_names_out(input_features=None)

# Test with no input_features after being fitted
it.fit(X, y)
feature_names = it.get_feature_names_out()
expected_feature_names = [f"x{i}" for i in range(X.shape[1])]
np.testing.assert_array_equal(feature_names, expected_feature_names)