You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Changes to the 2. chapter: "Single-cell RNA sequencing"! (#329)
* Improve wording, insert missing spaces and symbols (, and .) and add a note field until third generation sequencing (inclusive).
* proofreading of two paragraphs
* Proofreading until the end of the chapter
* again proofreading until RNA sequencing (grammar)
* change v3 of the artifact actions to v4
* adding key takeaways and some missing terms
* improvements based on Lukas comments + added cards with internal links for the key takeaways
* change one sentence
* Update jupyter-book/introduction/scrna_seq.md
Suggestion from Lukas 1
Co-authored-by: Lukas Heumos <[email protected]>
* Update jupyter-book/introduction/scrna_seq.md
Suggestion from Lukas 2
Co-authored-by: Lukas Heumos <[email protected]>
* put key takeaways in a dropdown and made them shorter
* update terms and put further readings into a {seealso} dropdown box
* adding many terms of the glossary
* change to new anchor logic
---------
Co-authored-by: Luis <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Copy file name to clipboardExpand all lines: jupyter-book/air_repertoire/clonotype.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -2033,7 +2033,7 @@
2033
2033
"source": [
2034
2034
"Dandelion defines clonotypes using a substitution model based on distances. It was created specifically to deal with the problem of somatic hypermutation in B-cells {cite}`yaari2013models` {cite}`cui2016model`. This model was available in the **Immcantation** suite as an R package {cite}`gupta2015change` {cite}`vander2014presto`. However, Dandelion makes possible to use it, avoiding the complication of moving between code languages and keeping the interoperability with *Scanpy* and *Scirpy*.\n",
2035
2035
"\n",
2036
-
"The model was created based on the probability of a punctual nucleotide change, considering the influence of the immediate two down- and upstream nucleotides {cite}`yaari2013models`. This methodology considered all the possible different 5-mers combinations just for the synonym mutation cases, i.e., those changes where the amino acid represented by the codon is not modified {cite}`yaari2013models`.\n",
2036
+
"The model was created based on the probability of a punctual nucleotide change, considering the influence of the immediate two down- and upstream nucleotides {cite}`yaari2013models`. This methodology considered all the possible different 5-mers combinations just for the synonym mutation cases, i.e., those changes where the amino acid represented by the {term}`codon` is not modified {cite}`yaari2013models`.\n",
2037
2037
"\n",
2038
2038
"Furthermore, Dandelion considers a model of substitution rates for single nucleotide instead of the 5-mer model. Therefore, all the substitutions are not changing, and they are displayed in the table below:\n",
Copy file name to clipboardExpand all lines: jupyter-book/air_repertoire/ir_profiling.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -564,7 +564,7 @@
564
564
"Even though we can detect the AIR sequence, it might not be productive, i.e., it might not form a valid AIR. Sequences, which do not result in functional AIRs, are therefore flagged as non-productive. These are usually ignored, when loading data by tools such as Scirpy, and not used for any downstream analysis.\n",
565
565
"Productive Immune receptors are defined by 10x Genomics [here](https://kb.10xgenomics.com/hc/en-us/articles/115003248383-What-are-productive-contigs-) as:\n",
566
566
"- Sequences spanning over from a V gene to a J-gene\n",
567
-
"- Having a start codon in the leading region\n",
567
+
"- Having a start {term}`codon` in the leading region\n",
568
568
"- Containing a CDR3 in the frame of the start codon.\n",
569
569
"- Do not contain a stop codon within the V-J span"
Copy file name to clipboardExpand all lines: jupyter-book/chromatin_accessibility/introduction.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@
66
66
"tags": []
67
67
},
68
68
"source": [
69
-
"As depicted above, chromatin accessibility is influenced by higher-order structure down to low-level DNA modifications. **(1)** Chromatin scaffolding driven by scaffold/matrix attachment regions (S/MARs) and proteins in the nuclear periphery such as nuclear pore complexes (NPCs) or lamins influences chromatin compactness and gene expression {cite}`atac:narwade_mapping_2019, atac:buchwalter_coaching_2019`. **(2, 3)** More local accessibility often referred to as densly packed heterochromatin versus open euchromatin can be actively controlled by ATP-dependent and ATP-independent chromatin remodeling complexes and histone modifications such as acetylation, methylation and phosphorylation. **(4)** Also the binding of transcription factors can influence nucleosome positioning and lead to the recruitment of histone-modifying enzymes and chromatin remodelers. **(5)** On a DNA level, methylation of CpG sites influences the binding affinity of various proteins including transcription factors and histone-modifying enzymes which combined leads to the silencing of the corresponding genomic regions. For an animated visualization we also recommend [this 2 minute video](https://www.youtube.com/watch?v=XelGO582s4U) on epigenetics and the regulation of gene activity (credits to Nicole Ethen from the SQE, University of Illinois). For a comprehensive and up-to-date review on genome regulation and TF activity, we refer to {cite}`atac:isbel_generating_2022`.\n",
69
+
"As depicted above, chromatin accessibility is influenced by higher-order structure down to low-level DNA modifications. **(1)** Chromatin scaffolding driven by scaffold/matrix attachment regions (S/MARs) and proteins in the nuclear periphery such as nuclear pore complexes (NPCs) or lamins influences chromatin compactness and gene expression {cite}`atac:narwade_mapping_2019, atac:buchwalter_coaching_2019`. **(2, 3)** More local accessibility often referred to as densly packed heterochromatin versus open euchromatin can be actively controlled by ATP-dependent and ATP-independent chromatin remodeling complexes and histone modifications such as acetylation, methylation and phosphorylation. **(4)** Also the binding of transcription factors can influence nucleosome positioning and lead to the recruitment of histone-modifying enzymes and chromatin remodelers. **(5)** On a DNA level, methylation of {term}`CpG` sites influences the binding affinity of various proteins including transcription factors and histone-modifying enzymes which combined leads to the silencing of the corresponding genomic regions. For an animated visualization we also recommend [this 2 minute video](https://www.youtube.com/watch?v=XelGO582s4U) on epigenetics and the regulation of gene activity (credits to Nicole Ethen from the SQE, University of Illinois). For a comprehensive and up-to-date review on genome regulation and TF activity, we refer to {cite}`atac:isbel_generating_2022`.\n",
70
70
"\n",
71
71
"Taken together, an essential component defining cell identity is the regulatory state of each cell. In this chapter, we focus on chromatin accessibility data measured by the **Single-Cell Assay for Transposase-Accessible Chromatin with High-Throughput Sequencing (scATAC-seq)** or as part of the **10x Multiome assay (scATAC combined with scRNA-seq)**. \n",
Copy file name to clipboardExpand all lines: jupyter-book/glossary.md
+27-8Lines changed: 27 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -22,10 +22,14 @@ BAM files
22
22
BAM files are binary, compressed versions of SAM (Sequence Alignment/Map) files that store sequencing read alignments to a reference genome.
23
23
They contain the same information as {term}`SAM` files - including read sequences, quality scores, and alignment positions - but in a more space-efficient format that enables faster processing and reduced storage requirements.
24
24
25
+
Amplification bias
26
+
A distortion that occurs during DNA or RNA amplification (e.g., PCR), where certain sequences are copied more efficiently than others. This can lead to uneven or inaccurate representation of the original genetic material, affecting results in experiments like sequencing or gene expression analysis.
27
+
25
28
Barcode
26
29
Barcodes
27
30
Bar code
28
-
Bar codes
31
+
Bar code
32
+
Cell barcode
29
33
Short DNA barcode fragments ("tags") that are used to identify reads originating from the same cell.
30
34
Reads are later grouped by their barcode during raw data processing steps.
31
35
@@ -37,15 +41,15 @@ Benchmark
37
41
An (independent) comparison of performance of several tools with respect to pre-defined metrics.
38
42
39
43
Bulk RNA sequencing
40
-
Contrary to single-cell sequencing, bulk sequencing measures the average expression values of several cells.
41
-
Therefore, resolution is lost, but bulk sequencing is usually cheaper, less laborious and faster to analyze.
44
+
bulk RNA-Seq
45
+
bulk sequencing
46
+
Contrary to single-cell sequencing, bulk sequencing measures the average expression values of several cells. Therefore, resolution is lost, but bulk sequencing is usually cheaper, less laborious and faster to analyze.
42
47
43
48
Cell
49
+
cells
44
50
The fundamental unit of life, consisting of cytoplasm enclosed within a membrane, containing biomolecules such as proteins and nucleic acids.
45
51
Cells acquire specific functions, transition into different types, divide, and communicate to sustain an organism.
46
52
Studying cell structure, activity, and interactions enables insights into gene expression dynamics, cellular trajectories, developmental lineages, and disease mechanisms.
47
-
Cell barcode
48
-
See {term}`barcode`
49
53
50
54
Cell type annotation
51
55
The process of labeling groups of {term}`clusters` of cells by {term}`cell type`.
@@ -60,6 +64,15 @@ Cell state
60
64
Chromatin
61
65
The complex of DNA and proteins efficiently packaging the DNA inside the nucleus and involved in regulating gene expression.
62
66
67
+
Codon
68
+
A sequence of three nucleotides corresponding to a specific amino acid or a start/stop signal in protein synthesis.
69
+
Codons are the basic units of the genetic code, determining how genetic information is translated into proteins.
70
+
71
+
CpG
72
+
A DNA sequence in which a cytosine (C) is followed by a guanine (G) along the 5' → 3' direction, linked by a phosphodiester bond.
73
+
CpG sites are often found in clusters called CpG islands near gene promoters.
74
+
Unmethylated CpG sites are associated with gene activation, while methylated CpG sites can lead to gene inhibition.
75
+
63
76
Cluster
64
77
Clusters
65
78
A group of a population or data points that share similarities.
@@ -129,6 +142,12 @@ Indrop
129
142
Library
130
143
Also known as sequencing library. A pool of DNA fragments with attached sequencing adapters.
131
144
145
+
Modalities
146
+
Multimodal
147
+
Different types of biological information measured at the single-cell level.
148
+
These include gene expression, chromatin accessibility, surface proteins, immune receptor sequences, and spatial organization.
149
+
Combining these modalities provides a more complete understanding of cell identity, function, and interactions.
150
+
132
151
Locus
133
152
Loci
134
153
loci
@@ -209,9 +228,9 @@ Trajectory inference
209
228
The computational recovery of dynamic processes by ordering cells by similarity or other means.
210
229
211
230
Unique Molecular Identifier (UMI)
212
-
unique molecular identifiers (UMIs)
213
-
Specific type of molecular barcodes aiding with error correction and increased accuracy during sequencing.
214
-
UMIs unique tag molecules in sample libraries enabling estimation of PCR duplication rates.
231
+
UMI
232
+
A special type of molecular barcode that uniquely tags each molecule in a sample library.
233
+
This, for example, enables the estimation of PCR duplication rates (see {term}`amplification bias`), which leads to error correction and increases accuracy.
Copy file name to clipboardExpand all lines: jupyter-book/introduction/raw_data_processing.md
+1-2Lines changed: 1 addition & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -496,8 +496,7 @@ Several common strategies are used for cell barcode identification and correctio
496
496
After cell barcode (CB) correction, reads have either been discarded or assigned to a corrected CB.
497
497
Subsequently, we wish to quantify the abundance of each gene within each corrected CB.
498
498
499
-
Because of the amplification bias as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules.
500
-
Additionally, several other complicating factors present challenges when attempting to perform this estimation.
499
+
Because of the {term}`amplification bias` as discussed in {ref}`exp-data:transcript-quantification`, reads must be deduplicated, based upon their UMI, to assess the true count of sampled molecules. Additionally, several other complicating factors present challenges when attempting to perform this estimation.
501
500
502
501
The UMI deduplication step aims to identify the set of reads and UMIs derived from each original, pre-PCR molecule in each cell captured and sequenced in the experiment.
503
502
The result of this process is to allocate a molecule count to each gene in each cell, which is subsequently used in the downstream analysis as the raw expression estimate for this gene.
0 commit comments