Skip to content

Support flattened dmrpp files. #581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
betolink opened this issue Apr 29, 2025 · 2 comments
Open

Support flattened dmrpp files. #581

betolink opened this issue Apr 29, 2025 · 2 comments
Labels

Comments

@betolink
Copy link

betolink commented Apr 29, 2025

Some dmrpp files created by OPeNDAP override the hierarchical structure of the HDF5/NetCDF format and flatten their structures, when they do that some dimensions get assigned the phony_dim_1, phony_dim_2 etc. and variables are not parsed correctly.

ICESat-2: https://data.nsidc.earthdatacloud.nasa.gov/nsidc-cumulus-prod-protected/ATLAS/ATL06/006/2020/01/02/ATL06_20200102190333_01080603_006_01.h5.dmrpp
SMAP: https://data.nsidc.earthdatacloud.nasa.gov/nsidc-cumulus-prod-protected/SMAP/SPL4SMGP/007/2023/12/31/SMAP_L4_SM_gph_20231231T223000_Vv7031_001.h5.dmrpp

According to @Mikejmnez, this wouldn't be a heavy lift and it will allow us to support more collections (until they fix the dmrpp generation) cc @danielfromearth @ayushnag

@Mikejmnez
Copy link

Mikejmnez commented Apr 30, 2025

I will take a look at this. @betolink can you write a minimal example that reproduces the error? It will help me understand if the issue is the dmr++ itself, the parser, or the dataset (there may be a combination of those things).

Flattened dmr++ is an option within the building process of the dmr++. These used to be produced (relatively recently) because many of other NASA tools that use dmr++s (and clients talking directly to hyrax data servers) could not understand/parse Groups. And so those dmr++ flattened the access to those files, even though the original file was not flat. Now many of the same NASA tools are compatible with Groups, along with clients APIs, and so the newer dmr++ do not necessarily need to be flatten, and some of the DAACs are choosing this route.

And so the problem may not be the parser itself, not the dmr++ but rather an error that arises when trying to access a file that is hierarchical, as if it is not (following the dmr++ route).

phoney_dims

The presence of {phoney_dim1, phoney_dim_2, ..., phoney_dim_N} means that the original file does not have named dimensions. Not global, not local. The same option that flattens the dmr++ also creates this missing named dimensions. That is probably not the error that @betolink is finding, but one that any user will stumble upon when the dmr++ gets updated and no longer flattened. I have run into dmr++s that are not flattened and that do not have named dimensions. The current dmr++ parser errs when it cannot find a name for a dimension. Even when I attempt to create a dataset with xarray talking to the cloud opendap server, creating the xarray dataset errs because there is a mismatch between the shape of arrays and the number of (named) dimensions.

@ayushnag
Copy link
Contributor

ayushnag commented May 1, 2025

There is logic here in the parser to handle phony_dims however it may be a combination of factors as you said that are causing it to not work for this dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants