Open
Description
Dear AI4Science team,
Thank you very much for your excellent work and for sharing the PoseBusters dataset!
I'm currently trying to reproduce the bioassembly files for the PoseBusters dataset using the /scripts/prepare_training_data.py
script together with the provided posebusters_mmcif
files. However, I’ve encountered some discrepancies:
- The generated CSV file does not match the one you provided; in particular, the
num_prot_chains
values appear to be consistently higher. - The generated bioassembly set only includes 302 of the 308 unique molecules from the original test set—six structures could not be reconstructed successfully.
Would it be possible for you to share more details about the process or specific parameters you used to prepare the dataset? Are there any modifications to the script or input files that I should be aware of to reproduce the results accurately?
Thank you in advance for your help!
Best regards,
Xujun
Metadata
Metadata
Assignees
Labels
No labels