trim TSO adaptor with cutadapt
Examples
cutadapt("-h")
#> [1] "cutadapt version 4.9"
#> [2] ""
#> [3] "Copyright (C) 2010 Marcel Martin <marcel.martin@scilifelab.se> and contributors"
#> [4] ""
#> [5] "Cutadapt removes adapter sequences from high-throughput sequencing reads."
#> [6] ""
#> [7] "Usage:"
#> [8] " cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq"
#> [9] ""
#> [10] "For paired-end reads:"
#> [11] " cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq"
#> [12] ""
#> [13] "Replace \"ADAPTER\" with the actual sequence of your 3' adapter. IUPAC wildcard"
#> [14] "characters are supported. All reads from input.fastq will be written to"
#> [15] "output.fastq with the adapter sequence removed. Adapter matching is"
#> [16] "error-tolerant. Multiple adapter sequences can be given (use further -a"
#> [17] "options), but only the best-matching adapter will be removed."
#> [18] ""
#> [19] "Input may also be in FASTA format. Compressed input and output is supported and"
#> [20] "auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for"
#> [21] "standard input/output. Without the -o option, output is sent to standard output."
#> [22] ""
#> [23] "Citation:"
#> [24] ""
#> [25] "Marcel Martin. Cutadapt removes adapter sequences from high-throughput"
#> [26] "sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011."
#> [27] "http://dx.doi.org/10.14806/ej.17.1.200"
#> [28] ""
#> [29] "Run \"cutadapt --help\" to see all command-line options."
#> [30] "See https://cutadapt.readthedocs.io/ for full documentation."
#> [31] ""
#> [32] "Options:"
#> [33] " -h, --help Show this help message and exit"
#> [34] " --version Show version number and exit"
#> [35] " --debug Print debug log. Use twice to also print DP matrices"
#> [36] " -j CORES, --cores CORES"
#> [37] " Number of CPU cores to use. Use 0 to auto-detect."
#> [38] " Default: 1"
#> [39] ""
#> [40] "Finding adapters:"
#> [41] " Parameters -a, -g, -b specify adapters to be removed from each read (or from"
#> [42] " R1 if data is paired-end. If specified multiple times, only the best"
#> [43] " matching adapter is trimmed (but see the --times option). Use notation"
#> [44] " 'file:FILE' to read adapter sequences from a FASTA file."
#> [45] ""
#> [46] " -a ADAPTER, --adapter ADAPTER"
#> [47] " Sequence of an adapter ligated to the 3' end (paired"
#> [48] " data: of the first read). The adapter and subsequent"
#> [49] " bases are trimmed. If a '$' character is appended"
#> [50] " ('anchoring'), the adapter is only found if it is a"
#> [51] " suffix of the read."
#> [52] " -g ADAPTER, --front ADAPTER"
#> [53] " Sequence of an adapter ligated to the 5' end (paired"
#> [54] " data: of the first read). The adapter and any preceding"
#> [55] " bases are trimmed. Partial matches at the 5' end are"
#> [56] " allowed. If a '^' character is prepended ('anchoring'),"
#> [57] " the adapter is only found if it is a prefix of the read."
#> [58] " -b ADAPTER, --anywhere ADAPTER"
#> [59] " Sequence of an adapter that may be ligated to the 5' or"
#> [60] " 3' end (paired data: of the first read). Both types of"
#> [61] " matches as described under -a and -g are allowed. If the"
#> [62] " first base of the read is part of the match, the"
#> [63] " behavior is as with -g, otherwise as with -a. This"
#> [64] " option is mostly for rescuing failed library"
#> [65] " preparations - do not use if you know which end your"
#> [66] " adapter was ligated to!"
#> [67] " -e E, --error-rate E, --errors E"
#> [68] " Maximum allowed error rate (if 0 <= E < 1), or absolute"
#> [69] " number of errors for full-length adapter match (if E is"
#> [70] " an integer >= 1). Error rate = no. of errors divided by"
#> [71] " length of matching region. Default: 0.1 (10%)"
#> [72] " --no-indels Allow only mismatches in alignments. Default: allow both"
#> [73] " mismatches and indels"
#> [74] " -n COUNT, --times COUNT"
#> [75] " Remove up to COUNT adapters from each read. Default: 1"
#> [76] " -O MINLENGTH, --overlap MINLENGTH"
#> [77] " Require MINLENGTH overlap between read and adapter for"
#> [78] " an adapter to be found. Default: 3"
#> [79] " --match-read-wildcards"
#> [80] " Interpret IUPAC wildcards in reads. Default: False"
#> [81] " -N, --no-match-adapter-wildcards"
#> [82] " Do not interpret IUPAC wildcards in adapters."
#> [83] " --action {trim,retain,mask,lowercase,crop,none}"
#> [84] " What to do if a match was found. trim: trim adapter and"
#> [85] " up- or downstream sequence; retain: trim, but retain"
#> [86] " adapter; mask: replace with 'N' characters; lowercase:"
#> [87] " convert to lowercase; crop: trim up and downstream"
#> [88] " sequence; none: leave unchanged. Default: trim"
#> [89] " --rc, --revcomp Check both the read and its reverse complement for"
#> [90] " adapter matches. If match is on reverse-complemented"
#> [91] " version, output that one. Default: check only read"
#> [92] ""
#> [93] "Additional read modifications:"
#> [94] " -u LEN, --cut LEN Remove LEN bases from each read (or R1 if paired; use -U"
#> [95] " option for R2). If LEN is positive, remove bases from"
#> [96] " the beginning. If LEN is negative, remove bases from the"
#> [97] " end. Can be used twice if LENs have different signs."
#> [98] " Applied *before* adapter trimming."
#> [99] " --nextseq-trim 3'CUTOFF"
#> [100] " NextSeq-specific quality trimming (each read). Trims"
#> [101] " also dark cycles appearing as high-quality G bases."
#> [102] " -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff [5'CUTOFF,]3'CUTOFF"
#> [103] " Trim low-quality bases from 5' and/or 3' ends of each"
#> [104] " read before adapter removal. Applied to both reads if"
#> [105] " data is paired. If one value is given, only the 3' end"
#> [106] " is trimmed. If two comma-separated cutoffs are given,"
#> [107] " the 5' end is trimmed with the first cutoff, the 3' end"
#> [108] " with the second."
#> [109] " --quality-base N Assume that quality values in FASTQ are encoded as"
#> [110] " ascii(quality + N). This needs to be set to 64 for some"
#> [111] " old Illumina FASTQ files. Default: 33"
#> [112] " --poly-a Trim poly-A tails"
#> [113] " --length LENGTH, -l LENGTH"
#> [114] " Shorten reads to LENGTH. Positive values remove bases at"
#> [115] " the end while negative ones remove bases at the"
#> [116] " beginning. This and the following modifications are"
#> [117] " applied after adapter trimming."
#> [118] " --trim-n Trim N's on ends of reads."
#> [119] " --length-tag TAG Search for TAG followed by a decimal number in the"
#> [120] " description field of the read. Replace the decimal"
#> [121] " number with the correct length of the trimmed read. For"
#> [122] " example, use --length-tag 'length=' to correct fields"
#> [123] " like 'length=123'."
#> [124] " --strip-suffix STRIP_SUFFIX"
#> [125] " Remove this suffix from read names if present. Can be"
#> [126] " given multiple times."
#> [127] " -x PREFIX, --prefix PREFIX"
#> [128] " Add this prefix to read names. Use {name} to insert the"
#> [129] " name of the matching adapter."
#> [130] " -y SUFFIX, --suffix SUFFIX"
#> [131] " Add this suffix to read names; can also include {name}"
#> [132] " --rename TEMPLATE Rename reads using TEMPLATE containing variables such as"
#> [133] " {id}, {adapter_name} etc. (see documentation)"
#> [134] " --zero-cap, -z Change negative quality values to zero."
#> [135] ""
#> [136] "Filtering of processed reads:"
#> [137] " Filters are applied after above read modifications. Paired-end reads are"
#> [138] " always discarded pairwise (see also --pair-filter)."
#> [139] ""
#> [140] " -m LEN[:LEN2], --minimum-length LEN[:LEN2]"
#> [141] " Discard reads shorter than LEN. Default: 0"
#> [142] " -M LEN[:LEN2], --maximum-length LEN[:LEN2]"
#> [143] " Discard reads longer than LEN. Default: no limit"
#> [144] " --max-n COUNT Discard reads with more than COUNT 'N' bases. If COUNT"
#> [145] " is a number between 0 and 1, it is interpreted as a"
#> [146] " fraction of the read length."
#> [147] " --max-expected-errors ERRORS, --max-ee ERRORS"
#> [148] " Discard reads whose expected number of errors (computed"
#> [149] " from quality values) exceeds ERRORS."
#> [150] " --max-average-error-rate ERROR_RATE, --max-aer ERROR_RATE"
#> [151] " as --max-expected-errors (see above), but divided by"
#> [152] " length to account for reads of varying length."
#> [153] " --discard-trimmed, --discard"
#> [154] " Discard reads that contain an adapter. Use also -O to"
#> [155] " avoid discarding too many randomly matching reads."
#> [156] " --discard-untrimmed, --trimmed-only"
#> [157] " Discard reads that do not contain an adapter."
#> [158] " --discard-casava Discard reads that did not pass CASAVA filtering (header"
#> [159] " has :Y:)."
#> [160] ""
#> [161] "Output:"
#> [162] " --quiet Print only error messages."
#> [163] " --report {full,minimal}"
#> [164] " Which type of report to print: 'full' or 'minimal'."
#> [165] " Default: full"
#> [166] " --json FILE Dump report in JSON format to FILE"
#> [167] " -o FILE, --output FILE"
#> [168] " Write trimmed reads to FILE. FASTQ or FASTA format is"
#> [169] " chosen depending on input. Summary report is sent to"
#> [170] " standard output. Use '{name}' for demultiplexing (see"
#> [171] " docs). Default: write to standard output"
#> [172] " --fasta Output FASTA to standard output even on FASTQ input."
#> [173] " -Z Use compression level 1 for gzipped output files"
#> [174] " (faster, but uses more space)"
#> [175] " --info-file FILE Write information about each read and its adapter"
#> [176] " matches into FILE. See the documentation for the file"
#> [177] " format."
#> [178] " -r FILE, --rest-file FILE"
#> [179] " When the adapter matches in the middle of a read, write"
#> [180] " the rest (after the adapter) to FILE."
#> [181] " --wildcard-file FILE When the adapter has N wildcard bases, write adapter"
#> [182] " bases matching wildcard positions to FILE. (Inaccurate"
#> [183] " with indels.)"
#> [184] " --too-short-output FILE"
#> [185] " Write reads that are too short (according to length"
#> [186] " specified by -m) to FILE. Default: discard reads"
#> [187] " --too-long-output FILE"
#> [188] " Write reads that are too long (according to length"
#> [189] " specified by -M) to FILE. Default: discard reads"
#> [190] " --untrimmed-output FILE"
#> [191] " Write reads that do not contain any adapter to FILE."
#> [192] " Default: output to same file as trimmed reads"
#> [193] ""
#> [194] "Paired-end options:"
#> [195] " The -A/-G/-B/-U/-Q options work like their lowercase counterparts, but are"
#> [196] " applied to R2 (second read in pair)"
#> [197] ""
#> [198] " -A ADAPTER 3' adapter to be removed from R2"
#> [199] " -G ADAPTER 5' adapter to be removed from R2"
#> [200] " -B ADAPTER 5'/3 adapter to be removed from R2"
#> [201] " -U LENGTH Remove LENGTH bases from R2"
#> [202] " -Q [5'CUTOFF,]3'CUTOFF"
#> [203] " Quality-trimming cutoff for R2. Default: same as for R1"
#> [204] " -L LENGTH Shorten R2 to LENGTH. Default: same as for R1"
#> [205] " -p FILE, --paired-output FILE"
#> [206] " Write R2 to FILE."
#> [207] " --pair-adapters Treat adapters given with -a/-A etc. as pairs. Either"
#> [208] " both or none are removed from each read pair."
#> [209] " --pair-filter {any,both,first}"
#> [210] " Which of the reads in a paired-end read have to match"
#> [211] " the filtering criterion in order for the pair to be"
#> [212] " filtered. Default: any"
#> [213] " --interleaved Read and/or write interleaved paired-end reads."
#> [214] " --untrimmed-paired-output FILE"
#> [215] " Write second read in a pair to this FILE when no adapter"
#> [216] " was found. Use with --untrimmed-output. Default: output"
#> [217] " to same file as trimmed reads"
#> [218] " --too-short-paired-output FILE"
#> [219] " Write second read in a pair to this file if pair is too"
#> [220] " short."
#> [221] " --too-long-paired-output FILE"
#> [222] " Write second read in a pair to this file if pair is too"
#> [223] " long."