Skip to contents

Given a set of mutations and gene annotation, calculate the relative position of each mutation within the gene body they are in.

Usage

relative_mutation_positions(
  mutations,
  annotation,
  bin = FALSE,
  by = c(region = "gene_name"),
  threads = 1
)

Arguments

mutations

either the tibble output from find_variants. It must have columns seqnames, pos, and a third column for specifying the gene id or gene name. The mutation must be within the gene region.

annotation

Either path to the annotation file (GTF/GFF) or a GRanges object of the gene annotation.

bin

logical(1): whether to bin the relative positions into 100 bins.

by

character(1): the column name in the annotation to match with the gene annotation. E.g. c("region" = "gene_name") to match the `region` column in the mutations with the `gene_name` column in the annotation.

threads

integer(1): number of threads to use.

Value

If bin = FALSE, a list of numeric vectors of relative positions of each mutation within the gene body. If bin = TRUE, a numeric vector of length 100 of the number of mutations in each bin.

Examples

outdir <- tempfile()
dir.create(outdir)
genome_fa <- system.file("extdata", "rps24.fa.gz", package = "FLAMES")
minimap2_align( # align to genome
  config = jsonlite::fromJSON(
    system.file("extdata", "config_sclr_nanopore_3end.json", package = "FLAMES")),
  fa_file = genome_fa,
  fq_in = system.file("extdata", "fastq", "demultiplexed.fq.gz", package = "FLAMES"),
  annot = system.file("extdata", "rps24.gtf.gz", package = "FLAMES"),
  outdir = outdir
)
#> 06:55:04 AM Mon Jan 06 2025 minimap2_align
#>                                                  total mapped primary secondary
#> /tmp/RtmpqmWKMU/filecd67e8e7fac/align2genome.bam    10     10      10         0
variants <- find_variants(
  bam_path = file.path(outdir, "align2genome.bam"),
  reference = genome_fa,
  annotation = system.file("extdata", "rps24.gtf.gz", package = "FLAMES"),
  min_nucleotide_depth = 4
)
#> 06:55:04 Reading reference ...
#> 06:55:04 Reading annotation ...
#> Merging overlapping genes ...
#> 0 overlapping regions, their gene_name(s) are merged with `, ` as separator
#> 06:55:04 Adding unannotated gaps ...
#> 06:55:04 Got 1 bam file, parallelizing over each region ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
#> 06:55:06 Merging results ...
#> 06:55:06 Calculating homopolymer percentages ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |======================================================================| 100%
#> 
positions <- 
 relative_mutation_positions(
   mutations = variants,
   annotation = system.file("extdata", "rps24.gtf.gz", package = "FLAMES")
 )
#> 06:55:07 Reading annotation ...
#> Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
#>  Please use one dimensional logical vectors instead.
#>  The deprecated feature was likely used in the FLAMES package.
#>   Please report the issue at <https://github.com/mritchielab/FLAMES/issues>.
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#>