Pipeline for Multi-sample Single Cell Data (deprecated)
Source:R/sc_long_multisample_pipeline.R
sc_long_multisample_pipeline.Rd
This function is deprecated. Please use MultiSampleSCPipeline
.
Usage
sc_long_multisample_pipeline(
annotation,
fastqs,
outdir,
genome_fa,
minimap2 = NULL,
barcodes_file = NULL,
expect_cell_numbers = NULL,
config_file = NULL
)
Arguments
- annotation
The file path to the annotation file in GFF3 format
- fastqs
The file path to input fastq file
- outdir
The path to directory to store all output files.
- genome_fa
The file path to genome fasta file.
- minimap2
Path to minimap2, optional.
- barcodes_file
The file with expected cell barcodes, with each barcode on a new line.
- expect_cell_numbers
The expected number of cells in the sample. This is used if
barcodes_file
is not provided. SeeBLAZE
for more details.- config_file
File path to the JSON configuration file.
See also
MultiSampleSCPipeline
for the new pipeline interface,
SingleCellPipeline
for single-sample pipeline,
BulkPipeline
for bulk long data.
Examples
reads <- ShortRead::readFastq(
system.file("extdata", "fastq", "musc_rps24.fastq.gz", package = "FLAMES")
)
outdir <- tempfile()
dir.create(outdir)
dir.create(file.path(outdir, "fastq"))
bc_allow <- file.path(outdir, "bc_allow.tsv")
genome_fa <- file.path(outdir, "rps24.fa")
R.utils::gunzip(
filename = system.file("extdata", "bc_allow.tsv.gz", package = "FLAMES"),
destname = bc_allow, remove = FALSE
)
R.utils::gunzip(
filename = system.file("extdata", "rps24.fa.gz", package = "FLAMES"),
destname = genome_fa, remove = FALSE
)
ShortRead::writeFastq(reads[1:100],
file.path(outdir, "fastq/sample1.fq.gz"), mode = "w", full = FALSE)
reads <- reads[-(1:100)]
ShortRead::writeFastq(reads[1:100],
file.path(outdir, "fastq/sample2.fq.gz"), mode = "w", full = FALSE)
reads <- reads[-(1:100)]
ShortRead::writeFastq(reads,
file.path(outdir, "fastq/sample3.fq.gz"), mode = "w", full = FALSE)
sce_list <- FLAMES::sc_long_multisample_pipeline(
annotation = system.file("extdata", "rps24.gtf.gz", package = "FLAMES"),
fastqs = c("sampleA" = file.path(outdir, "fastq"),
"sample1" = file.path(outdir, "fastq", "sample1.fq.gz"),
"sample2" = file.path(outdir, "fastq", "sample2.fq.gz"),
"sample3" = file.path(outdir, "fastq", "sample3.fq.gz")),
outdir = outdir,
genome_fa = genome_fa,
barcodes_file = rep(bc_allow, 4)
)
#> sc_long_multisample_pipeline is deprecated, please use MultiSampleSCPipeline instead.
#> No config file provided, creating a default config in /tmp/RtmpgpEV0i/file25bf6201de30
#> Writing configuration parameters to: /tmp/RtmpgpEV0i/file25bf6201de30/config_file_9663.json
#> Configured steps:
#> barcode_demultiplex: TRUE
#> genome_alignment: TRUE
#> gene_quantification: TRUE
#> isoform_identification: TRUE
#> read_realignment: TRUE
#> transcript_quantification: TRUE
#> samtools not found, will use Rsamtools package instead
#> Running step: barcode_demultiplex
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be replaced
#> Setting number of threads to 8
#> Search pattern:
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpgpEV0i/file25bf6201de30/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample1.fq.gz
#> Searching for barcodes...
#> Processing file: /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample2.fq.gz
#> Searching for barcodes...
#> Processing file: /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample3.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 393
#> Number of reads where at least one barcode was found: 368
#> Number of reads with exactly one barcode match: 364
#> Number of chimera reads: 1
#> All done!
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be replaced
#> Setting number of threads to 8
#> Search pattern:
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpgpEV0i/file25bf6201de30/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample1.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 100
#> Number of reads where at least one barcode was found: 92
#> Number of reads with exactly one barcode match: 91
#> Number of chimera reads: 1
#> All done!
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be replaced
#> Setting number of threads to 8
#> Search pattern:
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpgpEV0i/file25bf6201de30/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample2.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 100
#> Number of reads where at least one barcode was found: 95
#> Number of reads with exactly one barcode match: 94
#> Number of chimera reads: 0
#> All done!
#> FLEXIPLEX 0.96.2
#> Setting max barcode edit distance to 2
#> Setting max flanking sequence edit distance to 8
#> Setting read IDs to be replaced
#> Setting number of threads to 8
#> Search pattern:
#> primer: CTACACGACGCTCTTCCGATCT
#> BC: NNNNNNNNNNNNNNNN
#> UMI: NNNNNNNNNNNN
#> polyT: TTTTTTTTT
#> Setting known barcodes from /tmp/RtmpgpEV0i/file25bf6201de30/bc_allow.tsv
#> Number of known barcodes: 143
#> Processing file: /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample3.fq.gz
#> Searching for barcodes...
#> Number of reads processed: 193
#> Number of reads where at least one barcode was found: 181
#> Number of reads with exactly one barcode match: 179
#> Number of chimera reads: 0
#> All done!
#> Running step: genome_alignment
#> Creating junction bed file from GFF3 annotation.
#> Aligning sample /tmp/RtmpgpEV0i/file25bf6201de30/sampleA_matched_reads.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sampleA_align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by genome coordinates with 8 threads...
#> Indexing bam files
#> Aligning sample /tmp/RtmpgpEV0i/file25bf6201de30/sample1_matched_reads.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sample1_align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by genome coordinates with 8 threads...
#> Indexing bam files
#> Aligning sample /tmp/RtmpgpEV0i/file25bf6201de30/sample2_matched_reads.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sample2_align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by genome coordinates with 8 threads...
#> Indexing bam files
#> Aligning sample /tmp/RtmpgpEV0i/file25bf6201de30/sample3_matched_reads.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sample3_align2genome.bam
#> Your fastq file appears to have tags, but you did not provide the -y option to minimap2 to include the tags in the output.
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by genome coordinates with 8 threads...
#> Indexing bam files
#> Running step: gene_quantification
#> 03:23:33 AM Wed May 21 2025 quantify genes
#> Found genome alignment file(s): sample1_align2genome.bam
#> sample2_align2genome.bam
#> sample3_align2genome.bam
#> sampleA_align2genome.bam
#> Running step: isoform_identification
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Running step: read_realignment
#> Checking for fastq file(s) /tmp/RtmpgpEV0i/file25bf6201de30/fastq, /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample1.fq.gz, /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample2.fq.gz, /tmp/RtmpgpEV0i/file25bf6201de30/fastq/sample3.fq.gz
#> files found
#> Checking for fastq file(s) /tmp/RtmpgpEV0i/file25bf6201de30/sampleA_matched_reads.fastq, /tmp/RtmpgpEV0i/file25bf6201de30/sample1_matched_reads.fastq, /tmp/RtmpgpEV0i/file25bf6201de30/sample2_matched_reads.fastq, /tmp/RtmpgpEV0i/file25bf6201de30/sample3_matched_reads.fastq
#> files found
#> Checking for fastq file(s) /tmp/RtmpgpEV0i/file25bf6201de30/sampleA_matched_reads_dedup.fastq, /tmp/RtmpgpEV0i/file25bf6201de30/sample1_matched_reads_dedup.fastq, /tmp/RtmpgpEV0i/file25bf6201de30/sample2_matched_reads_dedup.fastq, /tmp/RtmpgpEV0i/file25bf6201de30/sample3_matched_reads_dedup.fastq
#> files found
#> Realigning sample /tmp/RtmpgpEV0i/file25bf6201de30/sampleA_matched_reads_dedup.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sampleA_realign2transcript.bam
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by 8 with CB threads...
#> Realigning sample /tmp/RtmpgpEV0i/file25bf6201de30/sample1_matched_reads_dedup.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sample1_realign2transcript.bam
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by 8 with CB threads...
#> Realigning sample /tmp/RtmpgpEV0i/file25bf6201de30/sample2_matched_reads_dedup.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sample2_realign2transcript.bam
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by 8 with CB threads...
#> Realigning sample /tmp/RtmpgpEV0i/file25bf6201de30/sample3_matched_reads_dedup.fastq -> /tmp/RtmpgpEV0i/file25bf6201de30/sample3_realign2transcript.bam
#> Warning: samtools not found, using Rsamtools instead, this could be slower and might fail for large BAM files.
#> Sorting BAM files by 8 with CB threads...
#> Running step: transcript_quantification
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: genome version information is not available for this TxDb object
#> OK
#> Pipeline saved to /tmp/RtmpgpEV0i/file25bf6201de30/pipeline.rds