Skip to content

Commit 8dc12b2

Browse files
LuisHeinzlmeierLuisZethson
authored
Changes to the 2. chapter: "Single-cell RNA sequencing"! (#329)
* Improve wording, insert missing spaces and symbols (, and .) and add a note field until third generation sequencing (inclusive). * proofreading of two paragraphs * Proofreading until the end of the chapter * again proofreading until RNA sequencing (grammar) * change v3 of the artifact actions to v4 * adding key takeaways and some missing terms * improvements based on Lukas comments + added cards with internal links for the key takeaways * change one sentence * Update jupyter-book/introduction/scrna_seq.md Suggestion from Lukas 1 Co-authored-by: Lukas Heumos <[email protected]> * Update jupyter-book/introduction/scrna_seq.md Suggestion from Lukas 2 Co-authored-by: Lukas Heumos <[email protected]> * put key takeaways in a dropdown and made them shorter * update terms and put further readings into a {seealso} dropdown box * adding many terms of the glossary * change to new anchor logic --------- Co-authored-by: Luis <[email protected]> Co-authored-by: Lukas Heumos <[email protected]>
1 parent 41b1cb5 commit 8dc12b2

File tree

7 files changed

+318
-140
lines changed

7 files changed

+318
-140
lines changed

jupyter-book/air_repertoire/clonotype.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2033,7 +2033,7 @@
20332033
"source": [
20342034
"Dandelion defines clonotypes using a substitution model based on distances. It was created specifically to deal with the problem of somatic hypermutation in B-cells {cite}`yaari2013models` {cite}`cui2016model`. This model was available in the **Immcantation** suite as an R package {cite}`gupta2015change` {cite}`vander2014presto`. However, Dandelion makes possible to use it, avoiding the complication of moving between code languages and keeping the interoperability with *Scanpy* and *Scirpy*.\n",
20352035
"\n",
2036-
"The model was created based on the probability of a punctual nucleotide change, considering the influence of the immediate two down- and upstream nucleotides {cite}`yaari2013models`. This methodology considered all the possible different 5-mers combinations just for the synonym mutation cases, i.e., those changes where the amino acid represented by the codon is not modified {cite}`yaari2013models`.\n",
2036+
"The model was created based on the probability of a punctual nucleotide change, considering the influence of the immediate two down- and upstream nucleotides {cite}`yaari2013models`. This methodology considered all the possible different 5-mers combinations just for the synonym mutation cases, i.e., those changes where the amino acid represented by the {term}`codon` is not modified {cite}`yaari2013models`.\n",
20372037
"\n",
20382038
"Furthermore, Dandelion considers a model of substitution rates for single nucleotide instead of the 5-mer model. Therefore, all the substitutions are not changing, and they are displayed in the table below:\n",
20392039
"\n",

jupyter-book/air_repertoire/ir_profiling.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@
564564
"Even though we can detect the AIR sequence, it might not be productive, i.e., it might not form a valid AIR. Sequences, which do not result in functional AIRs, are therefore flagged as non-productive. These are usually ignored, when loading data by tools such as Scirpy, and not used for any downstream analysis.\n",
565565
"Productive Immune receptors are defined by 10x Genomics [here](https://kb.10xgenomics.com/hc/en-us/articles/115003248383-What-are-productive-contigs-) as:\n",
566566
"- Sequences spanning over from a V gene to a J-gene\n",
567-
"- Having a start codon in the leading region\n",
567+
"- Having a start {term}`codon` in the leading region\n",
568568
"- Containing a CDR3 in the frame of the start codon.\n",
569569
"- Do not contain a stop codon within the V-J span"
570570
]

jupyter-book/chromatin_accessibility/introduction.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
"tags": []
6767
},
6868
"source": [
69-
"As depicted above, chromatin accessibility is influenced by higher-order structure down to low-level DNA modifications. **(1)** Chromatin scaffolding driven by scaffold/matrix attachment regions (S/MARs) and proteins in the nuclear periphery such as nuclear pore complexes (NPCs) or lamins influences chromatin compactness and gene expression {cite}`atac:narwade_mapping_2019, atac:buchwalter_coaching_2019`. **(2, 3)** More local accessibility often referred to as densly packed heterochromatin versus open euchromatin can be actively controlled by ATP-dependent and ATP-independent chromatin remodeling complexes and histone modifications such as acetylation, methylation and phosphorylation. **(4)** Also the binding of transcription factors can influence nucleosome positioning and lead to the recruitment of histone-modifying enzymes and chromatin remodelers. **(5)** On a DNA level, methylation of CpG sites influences the binding affinity of various proteins including transcription factors and histone-modifying enzymes which combined leads to the silencing of the corresponding genomic regions. For an animated visualization we also recommend [this 2 minute video](https://www.youtube.com/watch?v=XelGO582s4U) on epigenetics and the regulation of gene activity (credits to Nicole Ethen from the SQE, University of Illinois). For a comprehensive and up-to-date review on genome regulation and TF activity, we refer to {cite}`atac:isbel_generating_2022`.\n",
69+
"As depicted above, chromatin accessibility is influenced by higher-order structure down to low-level DNA modifications. **(1)** Chromatin scaffolding driven by scaffold/matrix attachment regions (S/MARs) and proteins in the nuclear periphery such as nuclear pore complexes (NPCs) or lamins influences chromatin compactness and gene expression {cite}`atac:narwade_mapping_2019, atac:buchwalter_coaching_2019`. **(2, 3)** More local accessibility often referred to as densly packed heterochromatin versus open euchromatin can be actively controlled by ATP-dependent and ATP-independent chromatin remodeling complexes and histone modifications such as acetylation, methylation and phosphorylation. **(4)** Also the binding of transcription factors can influence nucleosome positioning and lead to the recruitment of histone-modifying enzymes and chromatin remodelers. **(5)** On a DNA level, methylation of {term}`CpG` sites influences the binding affinity of various proteins including transcription factors and histone-modifying enzymes which combined leads to the silencing of the corresponding genomic regions. For an animated visualization we also recommend [this 2 minute video](https://www.youtube.com/watch?v=XelGO582s4U) on epigenetics and the regulation of gene activity (credits to Nicole Ethen from the SQE, University of Illinois). For a comprehensive and up-to-date review on genome regulation and TF activity, we refer to {cite}`atac:isbel_generating_2022`.\n",
7070
"\n",
7171
"Taken together, an essential component defining cell identity is the regulatory state of each cell. In this chapter, we focus on chromatin accessibility data measured by the **Single-Cell Assay for Transposase-Accessible Chromatin with High-Throughput Sequencing (scATAC-seq)** or as part of the **10x Multiome assay (scATAC combined with scRNA-seq)**. \n",
7272
"\n",

jupyter-book/glossary.md

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,14 @@ BAM files
2222
BAM files are binary, compressed versions of SAM (Sequence Alignment/Map) files that store sequencing read alignments to a reference genome.
2323
They contain the same information as {term}`SAM` files - including read sequences, quality scores, and alignment positions - but in a more space-efficient format that enables faster processing and reduced storage requirements.
2424
25+
Amplification bias
26+
A distortion that occurs during DNA or RNA amplification (e.g., PCR), where certain sequences are copied more efficiently than others. This can lead to uneven or inaccurate representation of the original genetic material, affecting results in experiments like sequencing or gene expression analysis.
27+
2528
Barcode
2629
Barcodes
2730
Bar code
28-
Bar codes
31+
Bar code
32+
Cell barcode
2933
Short DNA barcode fragments ("tags") that are used to identify reads originating from the same cell.
3034
Reads are later grouped by their barcode during raw data processing steps.
3135
@@ -37,15 +41,15 @@ Benchmark
3741
An (independent) comparison of performance of several tools with respect to pre-defined metrics.
3842
3943
Bulk RNA sequencing
40-
Contrary to single-cell sequencing, bulk sequencing measures the average expression values of several cells.
41-
Therefore, resolution is lost, but bulk sequencing is usually cheaper, less laborious and faster to analyze.
44+
bulk RNA-Seq
45+
bulk sequencing
46+
Contrary to single-cell sequencing, bulk sequencing measures the average expression values of several cells. Therefore, resolution is lost, but bulk sequencing is usually cheaper, less laborious and faster to analyze.
4247
4348
Cell
49+
cells
4450
The fundamental unit of life, consisting of cytoplasm enclosed within a membrane, containing biomolecules such as proteins and nucleic acids.
4551
Cells acquire specific functions, transition into different types, divide, and communicate to sustain an organism.
4652
Studying cell structure, activity, and interactions enables insights into gene expression dynamics, cellular trajectories, developmental lineages, and disease mechanisms.
47-
Cell barcode
48-
See {term}`barcode`
4953
5054
Cell type annotation
5155
The process of labeling groups of {term}`clusters` of cells by {term}`cell type`.
@@ -60,6 +64,15 @@ Cell state
6064
Chromatin
6165
The complex of DNA and proteins efficiently packaging the DNA inside the nucleus and involved in regulating gene expression.
6266
67+
Codon
68+
A sequence of three nucleotides corresponding to a specific amino acid or a start/stop signal in protein synthesis.
69+
Codons are the basic units of the genetic code, determining how genetic information is translated into proteins.
70+
71+
CpG
72+
A DNA sequence in which a cytosine (C) is followed by a guanine (G) along the 5' &rarr; 3' direction, linked by a phosphodiester bond.
73+
CpG sites are often found in clusters called CpG islands near gene promoters.
74+
Unmethylated CpG sites are associated with gene activation, while methylated CpG sites can lead to gene inhibition.
75+
6376
Cluster
6477
Clusters
6578
A group of a population or data points that share similarities.
@@ -129,6 +142,12 @@ Indrop
129142
Library
130143
Also known as sequencing library. A pool of DNA fragments with attached sequencing adapters.
131144
145+
Modalities
146+
Multimodal
147+
Different types of biological information measured at the single-cell level.
148+
These include gene expression, chromatin accessibility, surface proteins, immune receptor sequences, and spatial organization.
149+
Combining these modalities provides a more complete understanding of cell identity, function, and interactions.
150+
132151
Locus
133152
Loci
134153
loci
@@ -209,9 +228,9 @@ Trajectory inference
209228
The computational recovery of dynamic processes by ordering cells by similarity or other means.
210229
211230
Unique Molecular Identifier (UMI)
212-
unique molecular identifiers (UMIs)
213-
Specific type of molecular barcodes aiding with error correction and increased accuracy during sequencing.
214-
UMIs unique tag molecules in sample libraries enabling estimation of PCR duplication rates.
231+
UMI
232+
A special type of molecular barcode that uniquely tags each molecule in a sample library.
233+
This, for example, enables the estimation of PCR duplication rates (see {term}`amplification bias`), which leads to error correction and increases accuracy.
215234
216235
Untranslated Region (UTR)
217236
UTR

jupyter-book/introduction/raw_data_processing.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -496,8 +496,7 @@ Several common strategies are used for cell barcode identification and correctio
496496
After cell barcode (CB) correction, reads have either been discarded or assigned to a corrected CB.
497497
Subsequently, we wish to quantify the abundance of each gene within each corrected CB.
498498

499-
Because of the amplification bias as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules.
500-
Additionally, several other complicating factors present challenges when attempting to perform this estimation.
499+
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules. Additionally, several other complicating factors present challenges when attempting to perform this estimation.
501500

502501
The UMI deduplication step aims to identify the set of reads and UMIs derived from each original, pre-PCR molecule in each cell captured and sequenced in the experiment.
503502
The result of this process is to allocate a molecule count to each gene in each cell, which is subsequently used in the downstream analysis as the raw expression estimate for this gene.

jupyter-book/introduction/scrna_seq.bib

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ @Article{Svensson2017
3838
url={https://doi.org/10.1038/nmeth.4220}
3939
}
4040

41-
@Article{JOU1972,
42-
author={JOU, W. MIN
43-
and HAEGEMAN, G.
44-
and YSEBAERT, M.
45-
and FIERS, W.},
41+
@Article{Jou1972,
42+
author={Jou, W. Min
43+
and Haegeman, G.
44+
and Ysebaert, M.
45+
and Fiers, W.},
4646
title={Nucleotide Sequence of the Gene Coding for the Bacteriophage MS2 Coat Protein},
4747
journal={Nature},
4848
year={1972},

0 commit comments

Comments
 (0)