-
Notifications
You must be signed in to change notification settings - Fork 535
[FLAML Crash] [Classification] ValueError: Categorical categories must be unique #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you share the .csv file? A few lines are enough as long as this error can be reproduced. Also, could you let me know the flaml version? |
Thanks. I received a warning when reading the csv:
Then, I found that the last row contains |
Thanks for your reply, @sonichi . This worked for the higgs dataset, but I can imagine it might appear again for other datasets. Any plans to fix this in future releases? |
Your suggestion is welcome here. I don't know how common it is to use "?" for missing data, and how we are supposed to infer that without explicit hint from users. For example, we can't simply replace all "?" by "" because it could be a legitimate value. What would you recommend to address this kind of ambiguity? |
I have seen multiple OpenML datasets where the NaN values are stored as "?". I understand that a blind replacement of "?" with NaN might not be desired in some situation, but how about an analysis of the data types of column values? If e.g. 90%+ of values are integers, then "?" can be interpreted as NaN. If FLAML already has column data type checks, this can be integrated into it. |
Interesting idea. Do you suggest replacing "?" with NaN automatically if 90%+ of values are integers and the remainder are "?"? What if I do use 0 and ? to represent two categories and I have 90% 0s in my data? |
Hey, thanks for the great system.
I am experiencing a crash with a specific dataset. I get the following error when I fit FLAML on the Higgs dataset, a binary classification dataset used in the FLAML paper:
Here is my script:
Your feedback is appreciated.
The text was updated successfully, but these errors were encountered: