Chi-square based differential transcription usage analysis. This variant is meant for single cell data.
Takes the SingleCellExperiment
object from sc_long_pipeline
as input.
Alternatively, the path to the output folder could be provided instead of the SCE object.
A cluster annotation file cluster_annotation.csv
is required, please provide this file under the
output folder of sc_long_pipeline
.
Value
a data.frame
containing the following columns:
- gene_id
- differentially transcribed genes
- X_value
- the X value for the DTU gene
- df
- degrees of freedom of the approximate chi-squared distribution of the test statistic
- DTU_tr
- the transcript_id with the highest squared residuals
- DTU_group
- the cell group with the highest squared residuals
- p_value
- the p-value for the test
- adj_p
- the adjusted p-value (by Benjamini–Hochberg correction)
The table is sorted by decreasing P-values. It will also be saved as sc_DTU_analysis.csv
under the
output folder.
Details
This function will search for genes that have at least two isoforms, each with more than min_count
UMI counts.
For each gene, the per cell transcript counts were merged by group to generate pseudo bulk samples.
Grouping is specified by the cluster_annotation.csv
file.
The top 2 highly expressed transcripts for each group were selected and a UMI count matrix where
the rows are selected transcripts and columns are groups was used as input to a chi-square test of independence (chisq.test).
Adjusted P-values were calculated by Benjamini–Hochberg correction.
Examples
outdir <- tempfile()
dir.create(outdir)
bc_allow <- file.path(outdir, "bc_allow.tsv")
genome_fa <- file.path(outdir, "rps24.fa")
R.utils::gunzip(filename = system.file("extdata/bc_allow.tsv.gz", package = "FLAMES"), destname = bc_allow, remove = FALSE)
R.utils::gunzip(filename = system.file("extdata/rps24.fa.gz", package = "FLAMES"), destname = genome_fa, remove = FALSE)
if (!any(is.na(sys_which(c("minimap2", "k8"))))) {
sce <- FLAMES::sc_long_pipeline(
genome_fa = genome_fa,
fastq = system.file("extdata/fastq", package = "FLAMES"),
annotation = system.file("extdata/rps24.gtf.gz", package = "FLAMES"),
outdir = outdir,
barcodes_file = bc_allow
)
group_anno <- data.frame(barcode_seq = colnames(sce), groups = SingleCellExperiment::counts(sce)["ENSMUST00000169826.2", ] > 1)
write.csv(group_anno, file.path(outdir, "cluster_annotation.csv"), row.names = FALSE)
sc_DTU_analysis(sce, min_count = 1)
}