Removes isoform annotations that could produce ambigious reads, such as isoforms that only differ by the 5' / 3' end. This could be useful for plotting average coverage plots.
Arguments
- annotation
path to the GTF annotation file, or the parsed GenomicRanges object.
- keep
string, one of 'tss_differ' (only keep isoforms that all differ by the transcription start site position), 'tes_differ' (only keep those that differ by the transcription end site position), 'both' (only keep those that differ by both the start and end site), or 'single_transcripts' (only keep genes that contains a sinlge transcript).
Examples
filtered_annotation <- filter_annotation(
system.file("extdata", "rps24.gtf.gz", package = 'FLAMES'), keep = 'tes_differ')
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: The "phase" metadata column contains non-NA values for features of type
#> stop_codon. This information was ignored.
#> OK
filtered_annotation
#> GRanges object with 6 ranges and 2 metadata columns:
#> seqnames ranges strand | tx_id tx_name
#> <Rle> <IRanges> <Rle> | <integer> <character>
#> [1] chr14 19-5159 + | 1 ENSMUST00000225994.1
#> [2] chr14 32-3389 + | 6 ENSMUST00000225117.1
#> [3] chr14 68-5124 + | 7 ENSMUST00000224568.1
#> [4] chr14 86-1118 + | 8 ENSMUST00000224549.1
#> [5] chr14 160-2761 + | 9 ENSMUST00000224569.1
#> [6] chr14 450-1290 + | 10 ENSMUST00000224699.1
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths