-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
refactor: Rework Categorical/Enum to use (Frozen)Categories #23016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
72307c2
to
863cf09
Compare
ddb7532
to
9036ef6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #23016 +/- ##
==========================================
+ Coverage 80.68% 80.87% +0.18%
==========================================
Files 1645 1632 -13
Lines 221895 220133 -1762
Branches 2783 2782 -1
==========================================
- Hits 179036 178027 -1009
+ Misses 42197 41445 -752
+ Partials 662 661 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yeah bu' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments. Super nice to land this! Could you do a doc update maybe?
@coastalwhite I addressed most of your concerns, please respond to the others. |
Fixes #3036.
Fixes #14247.
Fixes #14996.
Fixes #15293.
Fixes #15781.
Fixes #17479.
Fixes #17643.
Fixes #18065.
Fixes #18501.
Fixes #19868.
Fixes #19943.
Fixes #20290.
Fixes #20318.
Fixes #20364.
Fixes #20562.
Fixes #20878.
Fixes #20931.
Fixes #21175.
Fixes #21583.
Fixes #22448.
Fixes #22586.
Fixes #22664.
Fixes #22830.
Fixes #23015.
Fixes #23071.
Fixes #23289.
This PR, essentially, replaces the entire Categorical/Enum implementation. There is some breakage that was essentially unavoidable, unfortunately:
Categorical
s has been removed, the ordering is now always lexical. The parameter has been deprecated, it is not a hard error to pass"physical"
as ordering, it just doesn't do anything anymore.Enum
s in them are read back asCategorical
s by older versions of Polars.Categorical
and integer types now always refer to the physical categories. These casts will be deprecated and removed at a later stage once we have dedicated functions to go to/from categories. The casts to/fromString
still exist and will remain so, any other casts have been removed.The concept of local and global categories is gone. The
StringCache
still exists in Python, but does nothing anymore, and will be deprecated and removed later.In a future PR we will expose the new capabilities of the new
Categories
system, which lets you specify in the DataType which columns should share the same categorical mapping.