org.apache.mahout.classifier.evaluation
Class Auc

java.lang.Object
  extended by org.apache.mahout.classifier.evaluation.Auc

public class Auc
extends Object

Computes AUC and a few other accuracy statistics without storing huge amounts of data. This is done by keeping uniform samples of the positive and negative scores. Then, when AUC is to be computed, the remaining scores are sorted and a rank-sum statistic is used to compute the AUC. Since AUC is invariant with respect to down-sampling of either positives or negatives, this is close to correct and is exactly correct if maxBufferSize or fewer positive and negative scores are examined.


Constructor Summary
Auc()
           
Auc(double threshold)
          Allocates a new data-structure for accumulating information about AUC and a few other accuracy measures.
 
Method Summary
 void add(int trueValue, double score)
          Adds a score to the AUC buffers.
 void add(int trueValue, int predictedClass)
           
 double auc()
          Computes the AUC of points seen so far.
 Matrix confusion()
          Returns the confusion matrix for the classifier supposing that we were to use a particular threshold.
 Matrix entropy()
          Returns a matrix related to the confusion matrix and to the log-likelihood.
 boolean isProbabilityScore()
           
 void setMaxBufferSize(int maxBufferSize)
           
 void setProbabilityScore(boolean probabilityScore)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Auc

public Auc(double threshold)
Allocates a new data-structure for accumulating information about AUC and a few other accuracy measures.

Parameters:
threshold - The threshold to use in computing the confusion matrix.

Auc

public Auc()
Method Detail

add

public void add(int trueValue,
                double score)
Adds a score to the AUC buffers.

Parameters:
trueValue - Whether this score is for a true-positive or a true-negative example.
score - The score for this example.

add

public void add(int trueValue,
                int predictedClass)

auc

public double auc()
Computes the AUC of points seen so far. This can be moderately expensive since it requires that all points that have been retained be sorted.

Returns:
The value of the Area Under the receiver operating Curve.

confusion

public Matrix confusion()
Returns the confusion matrix for the classifier supposing that we were to use a particular threshold.

Returns:
The confusion matrix.

entropy

public Matrix entropy()
Returns a matrix related to the confusion matrix and to the log-likelihood. For a pretty accurate classifier, N + entropy is nearly the same as the confusion matrix because log(1-eps) \approx -eps if eps is small. For lower accuracy classifiers, this measure will give us a better picture of how things work our. Also, by definition, log-likelihood = sum(diag(entropy))

Returns:
Returns a cell by cell break-down of the log-likelihood

setMaxBufferSize

public void setMaxBufferSize(int maxBufferSize)

isProbabilityScore

public boolean isProbabilityScore()

setProbabilityScore

public void setProbabilityScore(boolean probabilityScore)


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.