KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it #517

busekoseoglu · 2022-04-14T16:17:13Z

Code:

categorical_transformer = Pipeline(steps=[('one_hot', OneHotEncoder())])
categorical_features = ['merchant_category', 'merchant_group',"name_in_email"]

preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_features)
])

clf = Pipeline(steps=[('missing', fill_missing()),
('outlier', outlier_filling()),
('preprocessor', preprocessor),
('classifier', AutoML())])

clf.fit(X_train, y_train)

Note: It works when RandomForestClassifier is replaced with AutoML.

sonichi · 2022-04-14T17:46:27Z

@busekoseoglu I can't reproduce this problem with my synthetic data for testing. Could you please share an example dataset to reproduce this problem?
BTW, you don't have to use one hot encoding before AutoML.fit(). It often works better without this encoding.

busekoseoglu · 2022-04-15T15:40:18Z

Of course, I am attaching an example csv file. I ran it without One hot encoding but I'm wondering if it will work as well
sampledf.csv
.

sonichi · 2022-04-15T16:29:58Z

This works for me:

from flaml import AutoML
import pandas as pd

df = pd.read_csv("https://github.com/microsoft/FLAML/files/8496779/sampledf.csv")
X = df.drop(columns="has_paid")
y = df["has_paid"]
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
categorical_transformer = Pipeline(steps=[('one_hot', OneHotEncoder())])
categorical_features = ['merchant_category', 'merchant_group',"name_in_email"]

preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_features)
])

clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', AutoML())])

clf.fit(X, y)

I removed the first two steps in your pipeline because they are undefined.

('missing', fill_missing()),
('outlier', outlier_filling()),

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it #517

KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it #517

busekoseoglu commented Apr 14, 2022

sonichi commented Apr 14, 2022

Uh oh!

busekoseoglu commented Apr 15, 2022

Uh oh!

sonichi commented Apr 15, 2022 •

edited

Loading

Uh oh!

KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it #517

KeyError [...77789,77790] index not found error when I put AutoML in sklearn pipeline and run it #517

Comments

busekoseoglu commented Apr 14, 2022

sonichi commented Apr 14, 2022

Uh oh!

busekoseoglu commented Apr 15, 2022

Uh oh!

sonichi commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonichi commented Apr 15, 2022 •

edited

Loading