Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid modifying names of DataTree objects with parents #9494

Merged
merged 5 commits into from
Sep 14, 2024

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Sep 13, 2024

This ensures that the tree cannot be in an inconsistent state.

This ensures that the tree cannot be in an inconsistent state.

Fixes pydata#9447
@shoyer shoyer requested a review from TomNicholas September 13, 2024 22:20
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has a lot in common with #9492, but I say just merge this one first and I will update the other one to match.

Comment on lines +694 to +695
_validate_name(name) # is this check redundant?
self._name = name
Copy link
Member

@TomNicholas TomNicholas Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's redundant, at least in the sense that the original code uses the .name property setter, which now calls _validate_name. Other than that it is necessary though, because you can get down to this method by following

child._set_parent(new_parent=self, child_name=name)

which is in the .children setter.

Copy link
Member

@TomNicholas TomNicholas Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you could instead move this check to be inside the ._check_children staticmethod on TreeNode, which will get called first. I think that would be a clearer place to put the check, and you should then be able to remove the NamedNode._post_attach method entirely!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside I think this brings us closer to seeing that NamedNode is almost entirely unnecessary, if it weren't for the possibility of a root node having a name. Using TreeNode alone nodes would still have names, but those names are stored only as child keys on the parent.

So it makes sense that here we are able to refactor such that every name validation is actually done on TreeNode, with the exceptions of inside NamedNode.__init__ and the .name setter, which are the only places we could possibly check the name of the root node (as it has no parent to perform the validation via).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm was I wrong about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does seems so... honestly I find the various hooks on the base TreeNode pretty hard to reason about

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn. I guess just put the validation check back in, merge, then we can refactor with that regression test to adhere to?

I find the various hooks on the base TreeNode pretty hard to reason about

I normally find them okay - I want to look at this issue more closely. Of all the hooks defined on TreeNode we only actually use like 2 of them, the rest are holdovers from the original design copying the anytree library.

xarray/core/treenode.py Outdated Show resolved Hide resolved
xarray/core/treenode.py Outdated Show resolved Hide resolved
@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Sep 14, 2024
@shoyer shoyer merged commit 8db6bc9 into pydata:main Sep 14, 2024
28 checks passed
TomNicholas added a commit to TomNicholas/xarray that referenced this pull request Sep 16, 2024
TomNicholas added a commit to TomNicholas/xarray that referenced this pull request Sep 16, 2024
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (26 commits)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main:
  Opt out of floor division for float dtype time encoding (pydata#9497)
  fixed formatting for whats-new (pydata#9493)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
hollymandel pushed a commit to hollymandel/xarray that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistency of names of stored children / dataarrays with keys in wrapping object
2 participants