BSgenome-utils {BSgenome} | R Documentation |
Utilities for BSgenome objects.
## S4 method for signature 'BSgenome': matchPWM(pwm, subject, min.score = "80%", exclude = "", maskList = logical(0), asRangedData = TRUE) ## S4 method for signature 'BSgenome': countPWM(pwm, subject, min.score = "80%", exclude = "", maskList = logical(0)) ## S4 method for signature 'BSgenome': vmatchPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0, with.indels = FALSE, fixed = TRUE, algorithm = "auto", exclude = "", maskList = logical(0), userMask = RangesList(), invertUserMask = FALSE, asRangedData = TRUE) ## S4 method for signature 'BSgenome': vcountPattern(pattern, subject, max.mismatch = 0, min.mismatch = 0, with.indels = FALSE, fixed = TRUE, algorithm = "auto", exclude = "", maskList = logical(0), userMask = RangesList(), invertUserMask = FALSE) ## S4 method for signature 'BSgenome': vmatchPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0, fixed = TRUE, algorithm = "auto", verbose = FALSE, exclude = "", maskList = logical(0), asRangedData = TRUE) ## S4 method for signature 'BSgenome': vcountPDict(pdict, subject, max.mismatch = 0, min.mismatch = 0, fixed = TRUE, algorithm = "auto", collapse = FALSE, weight = 1L, verbose = FALSE, exclude = "", maskList = logical(0))
pwm |
A numeric matrix with row names A, C, G and T representing a Position Weight Matrix. |
pattern |
A DNAString object containing the pattern sequence. |
pdict |
A DNAStringSet object containing the pattern sequences. |
subject |
A BSgenome object containing the subject sequences. |
min.score |
The minimum score for counting a match.
Can be given as a character string containing a percentage (e.g.
"85%" ) of the highest possible score or as a single number.
|
max.mismatch, min.mismatch |
The maximum and minimum number of mismatching letters allowed (see
?`lowlevel-matching` for the details).
If non-zero, an inexact matching algorithm is used.
|
with.indels |
If TRUE then indels are allowed. In that case, min.mismatch
must be 0 and max.mismatch is interpreted as the maximum
"edit distance" allowed between the pattern and a match.
Note that in order to avoid pollution by redundant matches,
only the "best local matches" are returned.
Roughly speaking, a "best local match" is a match that is locally
both the closest (to the pattern P) and the shortest.
More precisely, a substring S' of the subject S is a "best local match" iff:
(a) nedit(P, S') <= max.mismatch (b) for every substring S1 of S': nedit(P, S1) > nedit(P, S') (c) for every substring S2 of S that contains S': nedit(P, S2) <= nedit(P, S')One nice property of "best local matches" is that their first and last letters are guaranteed to be aligned with letters in P (i.e. they match letters in P). |
fixed |
If FALSE then IUPAC extended letters are interpreted as ambiguities
(see ?`lowlevel-matching` for the details).
|
algorithm |
For vmatchPattern and vcountPattern one of the following:
"auto" , "naive-exact" , "naive-inexact" ,
"boyer-moore" , "shift-or" , or "indels" .
For |
collapse, weight |
ignored arguments. |
verbose |
TRUE or FALSE .
|
exclude |
A character vector with strings that will be used to filter out chromosomes whose names match these strings. |
maskList |
A named logical vector of maskStates preferred when used with a BSGenome object. When using the bsapply function, the masks will be set to the states in this vector. |
userMask |
A RangesList, containing a mask to be applied to each
chromosome. See bsapply .
|
invertUserMask |
Whether the userMask should be inverted.
|
asRangedData |
A logical value to assist in migrating output type from
RangedData (deprecated) to
GRanges. Should be FALSE .
If TRUE , a warning message is issued and a RangedData object
is returned.
|
A GRanges object for matchPWM
with two
elementMetadata columns: "score" (numeric), and "string" (DNAStringSet).
A GRanges object for vmatchPattern
.
A GRanges object for vmatchPDict
with
one elementMetadata column: "index", which represents a mapping to a
position in the original pattern dictionary.
A data.frame object for countPWM
and vcountPattern
with three columns: "seqname" (factor), "strand" (factor), and
"count" (integer).
A DataFrame object for vcountPDict
with four
columns: "seqname" ('factor' Rle), "strand" ('factor' Rle),
"index" (integer) and "count" ('integer' Rle). As with vmatchPDict
the index column represents a mapping to a position in the original
pattern dictionary.
P. Aboyoun
matchPWM
,
matchPattern
,
matchPDict
,
bsapply
library(BSgenome.Celegans.UCSC.ce2) data(HNF4alpha) pwm <- PWM(HNF4alpha) matchPWM(pwm, Celegans, asRangedData = FALSE) countPWM(pwm, Celegans) pattern <- consensusString(HNF4alpha) vmatchPattern(pattern, Celegans, fixed = "subject", asRangedData = FALSE) vcountPattern(pattern, Celegans, fixed = "subject") vmatchPDict(HNF4alpha[1:10], Celegans, asRangedData = FALSE) vcountPDict(HNF4alpha[1:10], Celegans)