org.apache.mahout.clustering.cdbw
Class CDbwEvaluator

java.lang.Object
  extended by org.apache.mahout.clustering.cdbw.CDbwEvaluator

public final class CDbwEvaluator
extends Object

This class calculates the CDbw metric as defined in http://www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf


Constructor Summary
CDbwEvaluator(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path clustersIn)
          Initialize a new instance from job information
CDbwEvaluator(Map<Integer,List<VectorWritable>> representativePoints, List<Cluster> clusters, DistanceMeasure measure)
          For testing only
 
Method Summary
 double getCDbw()
          Compute the CDbw validity metric (eqn 8).
 Map<Integer,Map<Integer,Double>> interClusterDensities()
          This function evaluates the density of points in the regions between each clusters (eqn 1).
 double interClusterDensity()
          This function evaluates the average density of points in the regions between clusters (eqn 1).
 Vector intraClusterDensities()
          The average density within clusters is defined as the percentage of representative points that reside in the neighborhood of the clusters' centers.
 double intraClusterDensity()
          The average density within clusters is defined as the percentage of representative points that reside in the neighborhood of the clusters' centers.
 double separation()
          Calculate the separation of clusters (eqn 4) taking into account both the distances between the clusters' closest points and the Inter-cluster density.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CDbwEvaluator

public CDbwEvaluator(Map<Integer,List<VectorWritable>> representativePoints,
                     List<Cluster> clusters,
                     DistanceMeasure measure)
For testing only

Parameters:
representativePoints - a Map> of representative points keyed by clusterId
clusters - a Map of the clusters keyed by clusterId
measure - an appropriate DistanceMeasure

CDbwEvaluator

public CDbwEvaluator(org.apache.hadoop.conf.Configuration conf,
                     org.apache.hadoop.fs.Path clustersIn)
Initialize a new instance from job information

Parameters:
conf - a Configuration with appropriate parameters
clustersIn - a String path to the input clusters directory
Method Detail

getCDbw

public double getCDbw()
Compute the CDbw validity metric (eqn 8). The goal of this metric is to reward clusterings which have a high intraClusterDensity and also a high cluster separation.

Returns:
a double

intraClusterDensity

public double intraClusterDensity()
The average density within clusters is defined as the percentage of representative points that reside in the neighborhood of the clusters' centers. The goal is the density within clusters to be significantly high. (eqn 5)

Returns:
a double

interClusterDensities

public Map<Integer,Map<Integer,Double>> interClusterDensities()
This function evaluates the density of points in the regions between each clusters (eqn 1). The goal is the density in the area between clusters to be significant low.

Returns:
a Map> of the inter-cluster densities

separation

public double separation()
Calculate the separation of clusters (eqn 4) taking into account both the distances between the clusters' closest points and the Inter-cluster density. The goal is the distances between clusters to be high while the representative point density in the areas between them are low.

Returns:
a double

interClusterDensity

public double interClusterDensity()
This function evaluates the average density of points in the regions between clusters (eqn 1). The goal is the density in the area between clusters to be significant low.

Returns:
a double

intraClusterDensities

public Vector intraClusterDensities()
The average density within clusters is defined as the percentage of representative points that reside in the neighborhood of the clusters' centers. The goal is the density within clusters to be significantly high. (eqn 5)

Returns:
a Vector of the intra-densities of each clusterId


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.