Help Needed to Reproduce PoseBusters Bioassemblies Correctly

Dear AI4Science team,

Thank you very much for your excellent work and for sharing the PoseBusters dataset!

I'm currently trying to reproduce the bioassembly files for the PoseBusters dataset using the `/scripts/prepare_training_data.py` script together with the provided `posebusters_mmcif` files. However, I’ve encountered some discrepancies:

- The generated CSV file does not match the one you provided; in particular, the `num_prot_chains` values appear to be consistently higher.
- The generated bioassembly set only includes 302 of the 308 unique molecules from the original test set—six structures could not be reconstructed successfully.

Would it be possible for you to share more details about the process or specific parameters you used to prepare the dataset? Are there any modifications to the script or input files that I should be aware of to reproduce the results accurately?

Thank you in advance for your help!

Best regards,  
Xujun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help Needed to Reproduce PoseBusters Bioassemblies Correctly #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Help Needed to Reproduce PoseBusters Bioassemblies Correctly #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions