Skip to content

Low clonotype group sizes and interpretation help - 10x VDJ #1916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dgagler opened this issue Mar 13, 2025 · 5 comments
Open

Low clonotype group sizes and interpretation help - 10x VDJ #1916

dgagler opened this issue Mar 13, 2025 · 5 comments

Comments

@dgagler
Copy link

dgagler commented Mar 13, 2025

Expected Result

Hi all, thanks for the great software--this is my first time using and it's very clean and well documented. I'm running 10x VDJ data on human tumor samples with MiXCR v4.7 and having some trouble interpreting the clonotype results. The whole process is running without failure and the alignment seems to be performing well, catching about 90% of reads and largely assigning them to CDR3. fastQC shows the sequencing data quality is also good.

When looking at the results of the cell grouping, however, I notice that the number of cells in my clones are very low for most of my samples, with the most dominant clone in each sample containing about 400-2500 cells. Biologically, as this is clonally proliferative cancer data, I'd expect significantly larger numbers, esp given the high quality of the sequencing data.

QC reports vary a bit by sample, but in several cases the QC check of the .clna file is fine, whereas the QC of the grouped.clns file shows some issues, suggesting that something is happening during clonotype assembly. Looking deeper, for 5/6 samples the assemble.report.txt shows that only between 12-45% of reads are being used for clonotypes.

Image

Could you provide some guidance into how best to interpret these results and how I may improve the assembly process to recover larger clonotype groups? Reports for 2 representative samples below, 1 which has seemingly good QC results (CAS) and another with bad QC (MART). Thanks!

Exact MiXCR commands

mixcr analyze 10x-sc-xcr-vdj \
      --species human \
      MART_BCR_S67_L{{n}}_{{R}}_001.fastq.gz \
      MART_MiXCR \
      -Xmx156g \
      -f

Report files

MART_MiXCR.align.report.txt
MART_MiXCR.assemble.report.txt
MART_MiXCR.clonesGrouping.report.txt

CAS_BCR_testResult.align.report.txt
CAS_BCR_testResult.assemble.report.txt
CAS_MiXCR.clonesGrouping.report.txt

@mizraelson
Copy link
Member

Hi, the QC results from .clna and .clns files should not differ. Can you check whether you indeed get different outputs when running it on .clns and .clna files from the same sample?
How many cells did you load on the 10x flowcell?

@dgagler
Copy link
Author

dgagler commented Mar 16, 2025

Interesting. Most of my samples seem to have variability in QC results among clna and clns files. 2 of my samples are actually lacking the assembledCells.clns file and are pulling an "Unknown file type" error while attempting to QC the contigs.clns file (MART and NEV). 1 sample seems consistent (CAS). The remaining 3 look variable. Screenshots attached below.

And I actually don't know how many cells were loaded as I received this data from a collaborator but I will try to get that information.

MART
Image
CAS
Image
NEV
Image
VANO
Image
HUBE
Image
GAYL
Image

@mizraelson
Copy link
Member

Is it possible to share one contigs.clns file and one grouped.clns for which you see the difference? You can send it to [email protected]

@mizraelson
Copy link
Member

mizraelson commented Mar 20, 2025

Thanks! So I think I know what is going on! I believe what happened is that you initially used MiXCR v4.6 to analyze your data. Then, you updated to v4.7 and reran the command, outputting the files in the same folder.

The tricky part is that what used to be *.grouped.clns in v4.6 is now called .assembledCells.clns in v4.7, so the old file was not overwritten. This means you are actually comparing the report from v4.7 (.contigs.clns, *.assembledCells.clns, .clna) with the report from v4.6 (.grouped.clns).

This explains why the reports differ—because we significantly improved the 10x preset in the latest version.

@dgagler
Copy link
Author

dgagler commented Mar 21, 2025

Yep, that sounds about right lol. Will re-comment if results are still looking suboptimal. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants