cz-benchmarks is a package for standardized evaluation and comparison of machine learning models for biological applications (first, in the single-cell transcriptomics domain, with future plans to expand to additional domains). The package provides a toolkit for running containerized models, executing biologically-relevant tasks, and computing performance metrics. We see this tool as a step towards ensuring that large-scale AI models can be harnessed to deliver genuine biological insights -- by building trust, accelerating development, and bridging the gap between ML and biology communities.
Last year, CZI hosted a workshop focused on benchmarking and evaluation of AI models in biology, and the insights gained have reinforced our commitment to supporting the development of a robust benchmarking infrastructure, which we see as critical to achieving our Virtual Cell vision.
We're working to get the alpha version of cz-benchmarks stable to build with the community. In the meantime, for issues you may identify, feel free to open an issue on GitHub or reach out to us at [email protected].
- Add a Custom Dataset
- Add a Custom Model
- Add a New Metric
- Add a New Task
- Interactive Mode
- Visualize Results
Full documentation: cz-benchmarks website
Find the package on PyPI