Skip to content

Docker SBOM dependency research #635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrew opened this issue Mar 18, 2025 · 2 comments
Open

Docker SBOM dependency research #635

andrew opened this issue Mar 18, 2025 · 2 comments

Comments

@andrew
Copy link
Member

andrew commented Mar 18, 2025

Bit of a placeholder for some ideas for research we can do into the collection of SBOMs that I've mined from all public docker images

We have download counts for each image (not broken down by version) and all the detected dependencies (with versions) within in them, plus some OS data and syft version

  • which operating systems (and which versions) are most used
  • most used dependencies within an ecosystem
  • most used dependencies across all ecosystems
  • which dependencies are used together the most across ecosystems (nokogiri and libxml2 for example)
  • most used versions of popular dependencies
  • extremely outdated versions of dependencies that are highly used
  • cross reference with security vulns to find the most used versions with known security advisories

cc @joshbressers feel free to add ideas in

Also noting some infra things that would be good to do around the same time:

  • move sboms into a seperate table
  • spin up a seperate server for the mining, leaving one for database and web app
  • increase the rate of mining docker metadata in packages.ecosyste.ms as the rate limits changed recently
@joshbressers
Copy link

A few things I'm curious about are

  • Exceptionally large files in images
  • The most seen files based on SHA
  • Amount of ecosystem mixing in images (like how many have python and npm packages)
  • Known sus filenames (like id_rda)

@andrew
Copy link
Member Author

andrew commented Mar 18, 2025

Known sus filenames (like id_rda)

Do you know of a list of those kinds of file names? I'd also be interested in looking for those in source repositories

Amount of ecosystem mixing in images (like how many have python and npm packages)

Yeah this is very interesting

Exceptionally large files in images and The most seen files based on SHA

Does syft look at any of that, or do we need to do extra analysis? Know of any existing tools that can pull that from a docker image without needing to run it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants