Training non-avian sound classifiers

This is a toy example using the opensoundscapes package as a wrapper to create custom classifiers for models pre-trained with bird embeddings. The bird models we will be using first are BirdNet and Google's Perch model.

Next steps

We will complete the following throughout this semester:

Learn to run the exploratory notebook train_on_embeddings.ipynb to create and test a custom classifer
Create custom classifiers using ONNX formatted pre-trained models instead of opensoundscapes
Create an approach for using stratified k-fold cross validation on multiple audio sample directories
Refactor code to be a python module that runs on startup
Send jobs to a super computer for iteration with various parameters
Experiment with new model architectures and data augmentation techniques
Write a report that compares approaches and results! 🚀

Development Container

Note: This repository contains a development container that can be used both locally with VSCode, on the cloud with GitHub Codespaces, or any combination of cloud backend and IDE using DevPod!

Prerequisites

Local

Cloud

A GitHub account (for using GitHub Codespaces)

OR

DevPod set up locally and configured to an appropriate cloud backend (more detail on this later!).

Getting Started

Using GitHub Codespaces

Click the "Code" button on the repository page
Select "Open with Codespaces"
Click "New codespace" (you can change the machine type here as well)
Wait for the environment to build and initialize

Using VS Code + Docker Locally

Clone the repository:

git clone https://github.com/username/repo-name.git
cd repo-name

Open in VS Code:

code .

When prompted "Reopen in Container", click "Reopen in Container"
- Or press CMD + Shift + P, type "Remote-Containers: Reopen in Container"

Project Structure

.
├── .devcontainer/          # Development container configuration
├── .vscode/               # VS Code settings, primarily for debugger launch configs
├── data/                  # Data storage - ignored by `git`!
│   ├── audio/...
├── exploratory/          # Jupyter notebooks for interactive work
├── src/                  # Source code - sourced as a python module (incomplete)
└── pixi.toml             # Pixi dependencies and settings

Getting Data

Data can be stored in the data/ directory. This directory is ignored by git, so you can store large files here without worrying about them being committed to the repository. This is useful for storing data that is too large to be stored in the repository, or for storing sensitive data that you don't want to share.

By default, we download the data used for this toy-ish example from a public GCP bucket, within .devcontainer/scripts/post_create/download_input_data.sh. This script is run by .devcontainer/scripts/run_post_create.sh after the container is created.

Managing Dependencies

The container will automatically install all required system dependencies and Python packages during the build process.

Additional system dependencies can be added to .devcontainer/scripts/on_build/install_system_dependencies.sh - or, to keep things cleaner, you can break up installs across multiple scripts. These will be called in order of their filenames, by .devcontainer/scripts/run_on_build.sh. This is performed during the Docker build process, so it's a good place to put things like apt-get installs.

After the container builds, Python dependecies are installed by pixi, using the pixi.toml and pixi.lock files. In order to add a new dependency here, you can either add it manually to the pixi.toml file, or use the pixi CLI to add it. For example, to add numpy:

pixi add numpy

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
.vscode		.vscode
exploratory		exploratory
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Training non-avian sound classifiers

Next steps

Development Container

Prerequisites

Local

Cloud

Getting Started

Using GitHub Codespaces

Using VS Code + Docker Locally

Project Structure

Getting Data

Managing Dependencies

About

Uh oh!

Releases

Packages

Languages

License

SchmidtDSE/non-avian-ml-toy

Folders and files

Latest commit

History

Repository files navigation

Training non-avian sound classifiers

Next steps

Development Container

Prerequisites

Local

Cloud

Getting Started

Using GitHub Codespaces

Using VS Code + Docker Locally

Project Structure

Getting Data

Managing Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages