Skip to contents

convert the transcript annotation to transcriptome assembly as FASTA file. The genome annotation is first imported as TxDb object and then used to extract transcript sequence from the genome assembly.

Usage

annotation_to_fasta(isoform_annotation, genome_fa, outdir, extract_fn)

Arguments

isoform_annotation

Path to the annotation file (GTF/GFF3)

genome_fa

The file path to genome fasta file.

outdir

The path to directory to store the transcriptome as transcript_assembly.fa.

extract_fn

(optional) Function to extract GRangesList from the genome TxDb object. E.g. function(txdb){GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE)}

Value

Path to the outputted transcriptome assembly

Examples

fasta <- annotation_to_fasta(system.file("extdata/rps24.gtf.gz", package = "FLAMES"), system.file("extdata/rps24.fa.gz", package = "FLAMES"), tempdir())
#> Import genomic features from the file as a GRanges object ... 
#> OK
#> Prepare the 'metadata' data frame ... 
#> OK
#> Make the TxDb object ... 
#> Warning: The "phase" metadata column contains non-NA values for features of type
#>   stop_codon. This information was ignored.
#> OK
cat(readChar(fasta, nchars = 1e3))
#> >ENSMUST00000225994.1
#> CTCTTTTCCTCCTCTCCAGCTCCGGCGCCGTAGCCATCATGAATGACACAGTAACCATCCGGACCAGGAAGTTCATGACC
#> AACCGTCTGCTTCAGAGGAAACAGATGGTCATTGATGTCCTTCATCCTGGGAAGGCAACAGTACCAAAGACAGAAATTCG
#> GGAAAAGCTGGCCAAAATGTACAAAACCACACCAGATGTCATCTTTGTATTTGGATTCAGAACCCACTTCGGTGGTGGCA
#> AGACCACTGGCTTTGGCATGATCTATGATTCTTTAGATTATGCAAAGAAGAATGAGCCTAAACACAGACTGGCAAGAGTA
#> GGTATCTTATTCTTTAATGGATACATGCCTGTAATCCCAGCCTGGACTGAGTTAGGATGGCAAGTTAGATTTTGTTTTCC
#> AGTGTATATTGGGATACAATAAGCAGTTTGGGCCAAACTTGTGGTGTTCTACCTACCCACTAGCCCATCTCCATGCAGTT
#> TCTCTGGCCTTCTGAACTGGAACTTAGCAGTCTTTACTGGGCTTTCTATTTCTAAGTAGTGGAATTGCAGGTGTGCACTG
#> CCATACCTCACTGTTTTGTATGAGCCGTGCTGCCGTCCATGCCTAGGAAAAGTTGGTATCATTTAATGGGAAAGTTACCA
#> TATACAATATACAATGCACACATGGTACCTTTAAAAATGTACAAACTTCATGTAGTCCAAAATGGGTAACAAGCTTGGTT
#> GATGGTCAGGATGCACCTGGCTGGCTTATTTCATTCTTTTAGGTGACTTCTCACTGTCGCCCAGGCTGACCTCCTCTGTC
#> TCCAGCGTGGTGAGATTACAGGCATGGGTTGCCCACCCTGAGTTATTTTCTTGTTTGTTTCATTGGGTTGTTTATTTGAG
#> GCATGGCCTTTAGCACAGGCTAGCCACAAACTGGGAGTTACTTTACCCAGTTTCATGAGTTCTTTAGTCCATTTACTCCT
#> TGGTAT