Skip to content

Help Needed to Reproduce PoseBusters Bioassemblies Correctly #122

Open
@schrojunzhang

Description

@schrojunzhang

Dear AI4Science team,

Thank you very much for your excellent work and for sharing the PoseBusters dataset!

I'm currently trying to reproduce the bioassembly files for the PoseBusters dataset using the /scripts/prepare_training_data.py script together with the provided posebusters_mmcif files. However, I’ve encountered some discrepancies:

  • The generated CSV file does not match the one you provided; in particular, the num_prot_chains values appear to be consistently higher.
  • The generated bioassembly set only includes 302 of the 308 unique molecules from the original test set—six structures could not be reconstructed successfully.

Would it be possible for you to share more details about the process or specific parameters you used to prepare the dataset? Are there any modifications to the script or input files that I should be aware of to reproduce the results accurately?

Thank you in advance for your help!

Best regards,
Xujun

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions