Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_model_data fails when trying to extract model data with monotonized features #15

Open
Krzys25 opened this issue Aug 24, 2023 · 2 comments

Comments

@Krzys25
Copy link

Krzys25 commented Aug 24, 2023

Hello,

I encountered an issue when trying to use the get_model_data function to extract model data from an ExplainableBoostingClassifier instance with monotonized features.

Steps to Reproduce:

  1. Fit an ExplainableBoostingClassifier instance.
  2. Apply the monotonize method for one of the features.
  3. Attempt to extract the model data using get_model_data.

Expected Behavior:
The function should return the model data successfully.

Actual Behavior:
The function fails with a TypeError in this line, and upon further investigation, I noticed that the entry in ebm.standard_deviations_ for the feature I applied monotonize to is being set to None.

Workaround:
Currently, I'm manually replacing the None value with a placeholder value, but I believe this should be handled gracefully by the library itself.

Please let me know if any further information is required or if there's a known solution to this problem.

Thanks in advance for your help.

Best,
Krzysztof

@HenrikSmith
Copy link

Hello,

I noticed a similar behaviour for the ExplainableBoostingRegressor for a model with monotinized features.

get_model_data seems to work fine as long as I do not modify the bins in any way. However, as soon as the bin boundaries move or the overall number of bins is changed, I keep on getting a ValueErrorValueError: operands could not be broadcast together with shapes (10,) (9,). The ebm.standard_deviations_ are returned as zeroes, however the dimension is always one smaller than it's supposed to be.

I can fix that by adding the missing zero to the respective array. However, I'm now facing other issues:

  • When calculating a ebm.global_explanation(), get_model_data appears to have updated the scores, but not the bin_weights, so I'm getting a TypeError: Axis must be specified when shapes of a and weights differ from this line.
  • During inference with ebm.predict_with_uncertainty(), it seems to be the other way round. I'm getting an IndexError: index 10 is out of bounds for axis 0 with size 10 from this line, because term_scores seems to lack some of the scores requested via bin_indexes.

Is there any way I can work around this?

Thank you very much in advance for your help!

@HenrikSmith
Copy link

Hello,

I inspected a little more and I may have found the root causes for the issues. For the variable modified via gamchanger:

  • the respective sub-array of ebm.standard_deviations_ is reset to zero but appears to be one entry short (a quick fix may be: replace with array of zeros with correct length)
  • the respective sub-array of ebm.bin_weights_ is not updated and hence still has the length (and probably also the content) of the original model (a quick fix may be: replace with array of ones with correct length)
  • the respective sub-array of ebm.bagged_scores_ is not updated and hence still has the length and content of the original model (a quick fix may be: replace all bags with the updated ebm.term_scores_ sub-array)

I haven't tested this thoroughly, yet, but maybe this is still of help to you guys or anyone else?

Thank you very much for your work and best regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants