You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GH-44500: [Python][Parquet] Map Parquet logical types to Arrow extension types by default (#46772)
### Rationale for this change
The Parquet C++ implementation now supports reading four logical types (JSON, UUID, Geometry, Geography) as Arrow extension types; however, users have to opt-in to avoid loosing the logical type on read.
### What changes are included in this PR?
This PR sets the default value of `arrow_extensions_enabled` to `True` (in Python).
### Are these changes tested?
Yes, the behaviour of `arrow_extensions_enabled` was already tested (and tests were updated to reflect the new default value).
### Are there any user-facing changes?
**This PR includes breaking changes to public APIs.**
Reading Parquet files that contained a JSON or UUID logical type will now have an extension type rather than string or fixed size binary, respectively. Python users that were relying on the previous behaviour would have to explicitly cast to storage or use `read_table(..., arrow_extensions_enabled=False)` after this PR:
```python
import uuid
import pyarrow as pa
json_array = pa.array(['{"k": "v"}'], pa.json_())
json_array.cast(pa.string())
#> [
#> "{"k": "v"}"
#> ]
uuid_array = pa.array([uuid.uuid4().bytes], pa.uuid())
uuid_array.cast(pa.binary(16))
#> <pyarrow.lib.FixedSizeBinaryArray object at 0x11e42b1c0>
#> [
#> 746C1022AB434A97972E1707EC3EE8F4
#> ]
```
* GitHub Issue: #44500
Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
0 commit comments