|
parseArgs(data,
targetClass,
otherClass=None,
**args)
parse arguments for a feature scoring function |
source code
|
|
|
singleFeatureSuccRate(data,
targetClass,
otherClass=None,
**args) |
source code
|
|
|
predictivity(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data; the score for feature i is:
s_i = P(Fi | C1) - P(Fi | C2),
where P(Fi | C) is the estimated probability of Feature i being nonzero given
the class variable
This is estimated as:
s_i = # of patterns in target class that have feature i /
no. |
source code
|
|
|
countDiff(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data; the score for feature i is:
s_i = (#(Fi | C ) - #(Fi | not C)) / #(Fi | C) |
source code
|
|
|
sensitivity(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data
(alternatively, with a threshold it could be used for continuous data)
s_i = #(Fi | C) / #(C) |
source code
|
|
|
ppv(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data
s_i = #(Fi | C) / #(Fi) |
source code
|
|
|
ppvThreshold(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data
s_i = #(Fi | C) / #(Fi) if #(Fi | C) > threshold and 0 otherwise |
source code
|
|
|
specificity(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data
s_i = #(Fi | C) / #(Fi) |
source code
|
|
|
usefullness(data,
targetClass,
otherClass=None,
**args)
A feature score for discrete data
optional arguments:
threshold
fraction |
source code
|
|
|
abundance(data,
targetClass,
otherClass=None,
**args)
Fraction of patterns that have a feature: A(F,C) = #(F | C) #(C) |
source code
|
|
|
oddsRatio(data,
targetClass,
otherClass=None,
**args) |
source code
|
|
|
logOddsRatio(data,
targetClass,
otherClass=None,
**args) |
source code
|
|
|
|
|
golub(data,
targetClass,
otherClass,
**args)
The Golub feature score:
s = (mu1 - mu2) / sqrt(sigma1^2 + sigma2^2) |
source code
|
|
|
succ(data,
targetClass,
otherClass,
**args)
the score of feature j is the success rate of a classifier that
classifies into the target class all points whose value of the feature
are higher than some threshold (linear 1-d classifier). |
source code
|
|
|
balancedSucc(data,
targetClass,
otherClass,
**args)
the score of feature j is the success rate of a classifier that
classifies into the target class all points whose value of the feature
are higher than some threshold (linear 1-d classifier). |
source code
|
|
|
|
|
featureCount(data,
*options,
**args)
returns a vector where component i gives the number of patterns where
feature i is nonzero
INPUTS:
data - a dataset
targetClass - class for which to count (optional, default behavior is
to look at all patterns)
Y - alternative label vector (optional)
feature - either a feature or list of features - counts the number of
patterns for which the feature or list of features is non-zero
I - a list of indices on which to do feature count
OPTIONS:
"complement" - look at the complement of the target class |
source code
|
|
|
featureMean(data,
targetClass=None,
Y=None)
returns a vector where component i is the mean of feature i
INPUT:
data - a dataset
targetClass - class for which to take the mean (optional)
Y - alternative label vector (optional) |
source code
|
|
|
featureStd(data,
targetClass=None,
Y=None)
returns a vector where component i is the standard deviation of feature i
INPUT:
data - a dataset
targetClass - class for which to take the mean (optional)
Y - alternative label vector (optional) |
source code
|
|
|
eliminateSparseFeatures(data,
threshold)
removes from the data features whose feature count is below a threshold
data - a dataset
threshold - number of occurrences of the feature below which it will be
eliminated |
source code
|
|
|
nonredundantFeatures(data,
w=None)
Compute a set of nonredundant features for a 0/1 sparse dataset
a feature is defined as redundant if there is another feature which has
nonzero value for exactly the same patterns, and has a larger weight
INPUT: a dataset and a list of weights for each feature in the data
weights are optional. |
source code
|
|
|
|
|
|
|
|
|
featureReport(data,
score='roc',
targetClass=1,
otherClass=0) |
source code
|
|