convert the transcript annotation to transcriptome assembly as FASTA file. The genome annotation is first imported as TxDb object and then used to extract transcript sequence from the genome assembly.
Arguments
- isoform_annotation
Path to the annotation file (GTF/GFF3)
- genome_fa
The file path to genome fasta file.
- outdir
The path to directory to store the transcriptome as
transcript_assembly.fa
.- extract_fn
(optional) Function to extract
GRangesList
from the genome TxDb object. E.g.function(txdb){GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE)}
Examples
fasta <- annotation_to_fasta(system.file("extdata", "rps24.gtf.gz", package = "FLAMES"), system.file("extdata", "rps24.fa.gz", package = "FLAMES"), tempdir())
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: The "phase" metadata column contains non-NA values for features of type
#> stop_codon. This information was ignored.
#> Warning: genome version information is not available for this TxDb object
#> OK
cat(readChar(fasta, nchars = 1e3))
#> >ENSMUST00000225994.1
#> CTCTTTTCCTCCTCTCCAGCTCCGGCGCCGTAGCCATCATGAATGACACAGTAACCATCCGGACCAGGAAGTTCATGACC
#> AACCGTCTGCTTCAGAGGAAACAGATGGTCATTGATGTCCTTCATCCTGGGAAGGCAACAGTACCAAAGACAGAAATTCG
#> GGAAAAGCTGGCCAAAATGTACAAAACCACACCAGATGTCATCTTTGTATTTGGATTCAGAACCCACTTCGGTGGTGGCA
#> AGACCACTGGCTTTGGCATGATCTATGATTCTTTAGATTATGCAAAGAAGAATGAGCCTAAACACAGACTGGCAAGAGTA
#> GGTATCTTATTCTTTAATGGATACATGCCTGTAATCCCAGCCTGGACTGAGTTAGGATGGCAAGTTAGATTTTGTTTTCC
#> AGTGTATATTGGGATACAATAAGCAGTTTGGGCCAAACTTGTGGTGTTCTACCTACCCACTAGCCCATCTCCATGCAGTT
#> TCTCTGGCCTTCTGAACTGGAACTTAGCAGTCTTTACTGGGCTTTCTATTTCTAAGTAGTGGAATTGCAGGTGTGCACTG
#> CCATACCTCACTGTTTTGTATGAGCCGTGCTGCCGTCCATGCCTAGGAAAAGTTGGTATCATTTAATGGGAAAGTTACCA
#> TATACAATATACAATGCACACATGGTACCTTTAAAAATGTACAAACTTCATGTAGTCCAAAATGGGTAACAAGCTTGGTT
#> GATGGTCAGGATGCACCTGGCTGGCTTATTTCATTCTTTTAGGTGACTTCTCACTGTCGCCCAGGCTGACCTCCTCTGTC
#> TCCAGCGTGGTGAGATTACAGGCATGGGTTGCCCACCCTGAGTTATTTTCTTGTTTGTTTCATTGGGTTGTTTATTTGAG
#> GCATGGCCTTTAGCACAGGCTAGCCACAAACTGGGAGTTACTTTACCCAGTTTCATGAGTTCTTTAGTCCATTTACTCCT
#> TGGTAT