Skip to contents

convert the transcript annotation to transcriptome assembly as FASTA file. The genome annotation is first imported as TxDb object and then used to extract transcript sequence from the genome assembly.

Usage

annotation_to_fasta(isoform_annotation, genome_fa, outfile, extract_fn)

Arguments

isoform_annotation

Path to the annotation file (GTF/GFF3)

genome_fa

The file path to genome fasta file.

outfile

The file path to the output FASTA file.

extract_fn

(optional) Function to extract GRangesList from the genome TxDb object. E.g. function(txdb){GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE)}

Value

This does not return anything. A FASTA file will be created at the specified location.

Examples

fasta <- tempfile()
annotation_to_fasta(system.file("extdata", "rps24.gtf.gz", package = "FLAMES"), system.file("extdata", "rps24.fa.gz", package = "FLAMES"), fasta)
#> Import genomic features from the file as a GRanges object ... 
#> OK
#> Prepare the 'metadata' data frame ... 
#> OK
#> Make the TxDb object ... 
#> Warning: The "phase" metadata column contains non-NA values for features of type
#>   stop_codon. This information was ignored.
#> Warning: genome version information is not available for this TxDb object
#> OK
cat(readChar(fasta, 1e3))
#> >ENSMUST00000225994.1
#> CTCTTTTCCTCCTCTCCAGCTCCGGCGCCGTAGCCATCATGAATGACACAGTAACCATCCGGACCAGGAAGTTCATGACC
#> AACCGTCTGCTTCAGAGGAAACAGATGGTCATTGATGTCCTTCATCCTGGGAAGGCAACAGTACCAAAGACAGAAATTCG
#> GGAAAAGCTGGCCAAAATGTACAAAACCACACCAGATGTCATCTTTGTATTTGGATTCAGAACCCACTTCGGTGGTGGCA
#> AGACCACTGGCTTTGGCATGATCTATGATTCTTTAGATTATGCAAAGAAGAATGAGCCTAAACACAGACTGGCAAGAGTA
#> GGTATCTTATTCTTTAATGGATACATGCCTGTAATCCCAGCCTGGACTGAGTTAGGATGGCAAGTTAGATTTTGTTTTCC
#> AGTGTATATTGGGATACAATAAGCAGTTTGGGCCAAACTTGTGGTGTTCTACCTACCCACTAGCCCATCTCCATGCAGTT
#> TCTCTGGCCTTCTGAACTGGAACTTAGCAGTCTTTACTGGGCTTTCTATTTCTAAGTAGTGGAATTGCAGGTGTGCACTG
#> CCATACCTCACTGTTTTGTATGAGCCGTGCTGCCGTCCATGCCTAGGAAAAGTTGGTATCATTTAATGGGAAAGTTACCA
#> TATACAATATACAATGCACACATGGTACCTTTAAAAATGTACAAACTTCATGTAGTCCAAAATGGGTAACAAGCTTGGTT
#> GATGGTCAGGATGCACCTGGCTGGCTTATTTCATTCTTTTAGGTGACTTCTCACTGTCGCCCAGGCTGACCTCCTCTGTC
#> TCCAGCGTGGTGAGATTACAGGCATGGGTTGCCCACCCTGAGTTATTTTCTTGTTTGTTTCATTGGGTTGTTTATTTGAG
#> GCATGGCCTTTAGCACAGGCTAGCCACAAACTGGGAGTTACTTTACCCAGTTTCATGAGTTCTTTAGTCCATTTACTCCT
#> TGGTAT