Description
Description
In the download modals for Datasets and Collections, please include the dataset_id
and a code snippet for downloading this dataset via the Census API.
Context
Use case: today I wanted to pre-filter the tabula sapiens dataset based on metadata found in .obs
before I download the count matrix. This is useful because I'm working on my local laptop, and the count data is large-ish, whereas I only actually need a small fraction of it.
In theory, this should be easy because Census provides a very nice cellxgene_census.get_obs
function, which can be run something like this: cellxgene_census.get_obs(obs_value_filter='dataset_id == foo')
.
However, this dataset ID is impossible to find unless you query all dataset_id
values in the Census and filter based on the collection_name
. (H/T to @ebezzi for helping me figure out this workaround!)
Impact
I usually browse datasets online, and then download via notebook so I can be more precise in which slices of the data I actually need. Making this more seamless would save me a lot of headache trying to track down the data I want once I'm ready to download.
Alternatives you've considered
I really don't think we surface this dataset_id
anywhere visible online. I even checked the dataset info box in Explorer. Maybe I'm just missing something? :)
Ideal behavior
In the modal, replace:
old:
Individual datasets and their versions may also be downloaded programmatically using the Discover API.
new:
To download this dataset via the Discover API, use this Python snippet:
cellxgene_census.get_anndata(obs_value_filter='dataset_id == foo')
