-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle empty DataCollection
objects?
#235
Comments
@kakhahmed just hit this as well. If there are sources selected but no trains, maybe we should keep one file for each source, so you can still use things like If there are no sources selected, it's less clear. Maybe just keep one arbitrary file open for |
Given the mechanism to have a source-less I can see the train-less |
We already raise an error for a glob pattern that doesn't match anything, but passing an empty list or dict to sel = run.select([(s, '*') for s in run.all_sources if 'PNCCD' in s]) Maybe it's OK to allow a DataCollection with no sources - it's only |
My point is that a train-less In your example, I would rather say an exception should be raised if there is no pnCCD rather when you're matching down to trains and there just happens to be none. When I select sources, I expect them to be there and quite likely will hardcode access to them. It is different with trains, where most likely an iteration follows. The most frequent exception I can think of is something like |
Sorry, I was writing too quickly. I think it would be reasonable in isolation to disallow making a DataCollection with no sources. But I think it's plausible that people are already doing that, and throwing an exception will break their code in some way, which I try fairly hard to avoid. It's not a totally hard rule, but I know that people lose trust fast when an update breaks what they're doing. |
Hmm, but that answers the initial question immediately: Make all access in |
I think I'm sold that sources-but-no-trains should be valid & working. When there are no sources, I'm still undecided between:
I'm leaning towards 2 - less risk of breaking things, but less special casing required. But I'm open to being persuaded either that this is a corner case which we can reasonably break, or that it's important enough that we should make it work properly. |
When thinking of any other clever tricks how to preserve the functionality, I was reminded of another unfortunate angle: There are files out there in the wild which conform to the European XFEL file structure, but are entirely empty of trains and sources (mostly legacy calibration files). So yes, option 3. of disallowing is it not an option I fear. Option 2. makes sense 👍 |
2 seems to be the safest option. Does it make sense to have a better error message that the dataCollection is empty or something. Instead of |
#244 should resolve this for selecting 0 trains. |
While the construction methods mostly prevent an empty
DataCollection
object (i.e. no files) to exist, it is still possible to obtain it later through selection mechanisms such asDataCollection.deselect('*')
orDataCollection.select_trains(np.s_[[]])
.Unfortunately there is now at least a single public API with
DataCollection.run_metadata()
that fails in such a case. The only other immediate point I found that usesfiles[0]
isSourceData.__getitem__
, which seems impossible to access in such a case.This begs the question: Should such an object be allowed to exist, i.e. all APIs must be able to handle it, or should we prevent its existence in the first place?
The text was updated successfully, but these errors were encountered: