Skip to content

Changes to the 2. chapter: "Single-cell RNA sequencing"! #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion jupyter-book/air_repertoire/clonotype.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1990,7 +1990,7 @@
"source": [
"Dandelion defines clonotypes using a substitution model based on distances. It was created specifically to deal with the problem of somatic hypermutation in B-cells {cite}`yaari2013models` {cite}`cui2016model`. This model was available in the **Immcantation** suite as an R package {cite}`gupta2015change` {cite}`vander2014presto`. However, Dandelion makes possible to use it, avoiding the complication of moving between code languages and keeping the interoperability with *Scanpy* and *Scirpy*.\n",
"\n",
"The model was created based on the probability of a punctual nucleotide change, considering the influence of the immediate two down- and upstream nucleotides {cite}`yaari2013models`. This methodology considered all the possible different 5-mers combinations just for the synonym mutation cases, i.e., those changes where the amino acid represented by the codon is not modified {cite}`yaari2013models`.\n",
"The model was created based on the probability of a punctual nucleotide change, considering the influence of the immediate two down- and upstream nucleotides {cite}`yaari2013models`. This methodology considered all the possible different 5-mers combinations just for the synonym mutation cases, i.e., those changes where the amino acid represented by the {term}`codon` is not modified {cite}`yaari2013models`.\n",
"\n",
"Furthermore, Dandelion considers a model of substitution rates for single nucleotide instead of the 5-mer model. Therefore, all the substitutions are not changing, and they are displayed in the table below:\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion jupyter-book/air_repertoire/ir_profiling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -518,7 +518,7 @@
"Even though we can detect the AIR sequence, it might not be productive, i.e., it might not form a valid AIR. Sequences, which do not result in functional AIRs, are therefore flagged as non-productive. These are usually ignored, when loading data by tools such as Scirpy, and not used for any downstream analysis.\n",
"Productive Immune receptors are defined by 10x Genomics [here](https://kb.10xgenomics.com/hc/en-us/articles/115003248383-What-are-productive-contigs-) as:\n",
"- Sequences spanning over from a V gene to a J-gene\n",
"- Having a start codon in the leading region\n",
"- Having a start {term}`codon` in the leading region\n",
"- Containing a CDR3 in the frame of the start codon.\n",
"- Do not contain a stop codon within the V-J span"
]
Expand Down
2 changes: 1 addition & 1 deletion jupyter-book/chromatin_accessibility/introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"tags": []
},
"source": [
"As depicted above, chromatin accessibility is influenced by higher-order structure down to low-level DNA modifications. **(1)** Chromatin scaffolding driven by scaffold/matrix attachment regions (S/MARs) and proteins in the nuclear periphery such as nuclear pore complexes (NPCs) or lamins influences chromatin compactness and gene expression {cite}`atac:narwade_mapping_2019, atac:buchwalter_coaching_2019`. **(2, 3)** More local accessibility often referred to as densly packed heterochromatin versus open euchromatin can be actively controlled by ATP-dependent and ATP-independent chromatin remodeling complexes and histone modifications such as acetylation, methylation and phosphorylation. **(4)** Also the binding of transcription factors can influence nucleosome positioning and lead to the recruitment of histone-modifying enzymes and chromatin remodelers. **(5)** On a DNA level, methylation of CpG sites influences the binding affinity of various proteins including transcription factors and histone-modifying enzymes which combined leads to the silencing of the corresponding genomic regions. For an animated visualization we also recommend [this 2 minute video](https://www.youtube.com/watch?v=XelGO582s4U) on epigenetics and the regulation of gene activity (credits to Nicole Ethen from the SQE, University of Illinois). For a comprehensive and up-to-date review on genome regulation and TF activity, we refer to {cite}`atac:isbel_generating_2022`.\n",
"As depicted above, chromatin accessibility is influenced by higher-order structure down to low-level DNA modifications. **(1)** Chromatin scaffolding driven by scaffold/matrix attachment regions (S/MARs) and proteins in the nuclear periphery such as nuclear pore complexes (NPCs) or lamins influences chromatin compactness and gene expression {cite}`atac:narwade_mapping_2019, atac:buchwalter_coaching_2019`. **(2, 3)** More local accessibility often referred to as densly packed heterochromatin versus open euchromatin can be actively controlled by ATP-dependent and ATP-independent chromatin remodeling complexes and histone modifications such as acetylation, methylation and phosphorylation. **(4)** Also the binding of transcription factors can influence nucleosome positioning and lead to the recruitment of histone-modifying enzymes and chromatin remodelers. **(5)** On a DNA level, methylation of {term}`CpG` sites influences the binding affinity of various proteins including transcription factors and histone-modifying enzymes which combined leads to the silencing of the corresponding genomic regions. For an animated visualization we also recommend [this 2 minute video](https://www.youtube.com/watch?v=XelGO582s4U) on epigenetics and the regulation of gene activity (credits to Nicole Ethen from the SQE, University of Illinois). For a comprehensive and up-to-date review on genome regulation and TF activity, we refer to {cite}`atac:isbel_generating_2022`.\n",
"\n",
"Taken together, an essential component defining cell identity is the regulatory state of each cell. In this chapter, we focus on chromatin accessibility data measured by the **Single-Cell Assay for Transposase-Accessible Chromatin with High-Throughput Sequencing (scATAC-seq)** or as part of the **10x Multiome assay (scATAC combined with scRNA-seq)**. \n",
"\n",
Expand Down
31 changes: 26 additions & 5 deletions jupyter-book/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ BAM files
BAM files are binary, compressed versions of SAM (Sequence Alignment/Map) files that store sequencing read alignments to a reference genome.
They contain the same information as {term}`SAM` files - including read sequences, quality scores, and alignment positions - but in a more space-efficient format that enables faster processing and reduced storage requirements.

Amplification bias
A distortion that occurs during DNA or RNA amplification (e.g., PCR), where certain sequences are copied more efficiently than others. This can lead to uneven or inaccurate representation of the original genetic material, affecting results in experiments like sequencing or gene expression analysis.


Barcode
Barcodes
Bar code
Expand All @@ -37,10 +41,12 @@ Benchmark
An (independent) comparison of performance of several tools with respect to pre-defined metrics.

Bulk RNA sequencing
Contrary to single-cell sequencing, bulk sequencing measures the average expression values of several cells.
Therefore, resolution is lost, but bulk sequencing is usually cheaper, less laborious and faster to analyze.
bulk RNA-Seq
bulk sequencing
Contrary to single-cell sequencing, bulk sequencing measures the average expression values of several cells. Therefore, resolution is lost, but bulk sequencing is usually cheaper, less laborious and faster to analyze.

Cell
cells
The fundamental unit of life, consisting of cytoplasm enclosed within a membrane, containing biomolecules such as proteins and nucleic acids.
Cells acquire specific functions, transition into different types, divide, and communicate to sustain an organism.
Studying cell structure, activity, and interactions enables insights into gene expression dynamics, cellular trajectories, developmental lineages, and disease mechanisms.
Expand All @@ -60,6 +66,15 @@ Cell state
Chromatin
The complex of DNA and proteins efficiently packaging the DNA inside the nucleus and involved in regulating gene expression.

Codon
A sequence of three nucleotides corresponding to a specific amino acid or a start/stop signal in protein synthesis.
Codons are the basic units of the genetic code, determining how genetic information is translated into proteins.

CpG
A DNA sequence in which a cytosine (C) is followed by a guanine (G) along the 5' → 3' direction, linked by a phosphodiester bond.
CpG sites are often found in clusters called CpG islands near gene promoters.
Unmethylated CpG sites are associated with gene activation, while methylated CpG sites can lead to gene inhibition.

Cluster
Clusters
A group of a population or data points that share similarities.
Expand Down Expand Up @@ -129,6 +144,12 @@ Indrop
Library
Also known as sequencing library. A pool of DNA fragments with attached sequencing adapters.

Modalities
Multimodal
Different types of biological information measured at the single-cell level.
These include gene expression, chromatin accessibility, surface proteins, immune receptor sequences, and spatial organization.
Combining these modalities provides a more complete understanding of cell identity, function, and interactions.

Locus
Loci
loci
Expand Down Expand Up @@ -208,10 +229,10 @@ Trajectory inference
Also known as pseudotemporal ordering.
The computational recovery of dynamic processes by ordering cells by similarity or other means.

UMI
Unique Molecular Identifier (UMI)
unique molecular identifiers (UMIs)
Specific type of molecular barcodes aiding with error correction and increased accuracy during sequencing.
UMIs unique tag molecules in sample libraries enabling estimation of PCR duplication rates.
A unique molecular identifier (UMI) is a special type of molecular barcode that uniquely tags each molecule in a sample library.
This, for example, enables the estimation of PCR duplication rates (see {term}`amplification bias`), which leads to error correction and increases accuracy.

Untranslated Region (UTR)
UTR
Expand Down
3 changes: 1 addition & 2 deletions jupyter-book/introduction/raw_data_processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,8 +465,7 @@ Several common strategies are used for cell barcode identification and correctio
After cell barcode (CB) correction, reads have either been discarded or assigned to a corrected CB.
Subsequently, we wish to quantify the abundance of each gene within each corrected CB.

Because of the amplification bias as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules.
Additionally, several other complicating factors present challenges when attempting to perform this estimation.
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules. Additionally, several other complicating factors present challenges when attempting to perform this estimation.

The UMI deduplication step aims to identify the set of reads and UMIs derived from each original, pre-PCR molecule in each cell captured and sequenced in the experiment.
The result of this process is to allocate a molecule count to each gene in each cell, which is subsequently used in the downstream analysis as the raw expression estimate for this gene.
Expand Down
10 changes: 5 additions & 5 deletions jupyter-book/introduction/scrna_seq.bib
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ @Article{Svensson2017
url={https://doi.org/10.1038/nmeth.4220}
}

@Article{JOU1972,
author={JOU, W. MIN
and HAEGEMAN, G.
and YSEBAERT, M.
and FIERS, W.},
@Article{Jou1972,
author={Jou, W. Min
and Haegeman, G.
and Ysebaert, M.
and Fiers, W.},
title={Nucleotide Sequence of the Gene Coding for the Bacteriophage MS2 Coat Protein},
journal={Nature},
year={1972},
Expand Down
Loading
Loading