Skip to content

Add the dataset ID to the download modals for easier web <-> API transitions #7423

Open
@sidneymbell

Description

@sidneymbell

Description

In the download modals for Datasets and Collections, please include the dataset_id and a code snippet for downloading this dataset via the Census API.

Context

Use case: today I wanted to pre-filter the tabula sapiens dataset based on metadata found in .obs before I download the count matrix. This is useful because I'm working on my local laptop, and the count data is large-ish, whereas I only actually need a small fraction of it.

In theory, this should be easy because Census provides a very nice cellxgene_census.get_obs function, which can be run something like this: cellxgene_census.get_obs(obs_value_filter='dataset_id == foo').

However, this dataset ID is impossible to find unless you query all dataset_id values in the Census and filter based on the collection_name. (H/T to @ebezzi for helping me figure out this workaround!)

Impact

I usually browse datasets online, and then download via notebook so I can be more precise in which slices of the data I actually need. Making this more seamless would save me a lot of headache trying to track down the data I want once I'm ready to download.

Alternatives you've considered

I really don't think we surface this dataset_id anywhere visible online. I even checked the dataset info box in Explorer. Maybe I'm just missing something? :)

Ideal behavior

In the modal, replace:
old:

Individual datasets and their versions may also be downloaded programmatically using the Discover API.

new:

To download this dataset via the Discover API, use this Python snippet:
cellxgene_census.get_anndata(obs_value_filter='dataset_id == foo')

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions