-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forbid modifying names of DataTree objects with parents #9494
Conversation
This ensures that the tree cannot be in an inconsistent state. Fixes pydata#9447
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has a lot in common with #9492, but I say just merge this one first and I will update the other one to match.
_validate_name(name) # is this check redundant? | ||
self._name = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's redundant, at least in the sense that the original code uses the .name
property setter, which now calls _validate_name
. Other than that it is necessary though, because you can get down to this method by following
xarray/xarray/core/treenode.py
Line 180 in 18e5c87
child._set_parent(new_parent=self, child_name=name) |
which is in the .children
setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you could instead move this check to be inside the ._check_children
staticmethod on TreeNode
, which will get called first. I think that would be a clearer place to put the check, and you should then be able to remove the NamedNode._post_attach
method entirely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside I think this brings us closer to seeing that NamedNode
is almost entirely unnecessary, if it weren't for the possibility of a root node having a name. Using TreeNode
alone nodes would still have names, but those names are stored only as child keys on the parent.
So it makes sense that here we are able to refactor such that every name validation is actually done on TreeNode
, with the exceptions of inside NamedNode.__init__
and the .name
setter, which are the only places we could possibly check the name of the root node (as it has no parent to perform the validation via).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm was I wrong about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does seems so... honestly I find the various hooks on the base TreeNode pretty hard to reason about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn. I guess just put the validation check back in, merge, then we can refactor with that regression test to adhere to?
I find the various hooks on the base TreeNode pretty hard to reason about
I normally find them okay - I want to look at this issue more closely. Of all the hooks defined on TreeNode we only actually use like 2 of them, the rest are holdovers from the original design copying the anytree library.
Co-authored-by: Tom Nicholas <[email protected]>
* main: (26 commits) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) ...
* main: Opt out of floor division for float dtype time encoding (pydata#9497) fixed formatting for whats-new (pydata#9493) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483)
This ensures that the tree cannot be in an inconsistent state.