Finding exact matches of a query in a fasta_file
KmerGMA.firstMatch
— FunctionfirstMatch(readerFASTX.FASTA.Reader, query::LongSequence{DNAAlphabet{4}})
Scans through a FASTA,Reader object to find the FIRST occurence of the query(dna longsequence) and prints the results to the REPL.
This function is only really needed to quickly see if there is alot of matches.
KmerGMA.exactMatch
— FunctionexactMatch(query, subject_seq, overlap::Bool = true)
Finds all exact matches to a query sequence(dna longsequence) in the given genome assembly as a reader object(seq) or single sequence
query can be a FASTA record, a substring of a dna sequence or a dna longsequence.
seq can be a fasta record, dna (sub)sequence, or fasta READER.
overlap is a boolean argument and is true by default
Returns a dictionary of the identifiers of individual records it found matches in and the match locations.
The algorithm is simply based on the Biosequences findfirst() function and runs quite fast through entire genomes.