org.apache.mahout.clustering.canopy
Class CanopyClusterer

java.lang.Object
  extended by org.apache.mahout.clustering.canopy.CanopyClusterer

public class CanopyClusterer
extends Object


Constructor Summary
CanopyClusterer(org.apache.hadoop.conf.Configuration config)
           
CanopyClusterer(DistanceMeasure measure, double t1, double t2)
           
 
Method Summary
 void addPointToCanopies(Vector point, Collection<Canopy> canopies)
          This is the same algorithm as the reference but inverted to iterate over existing canopies instead of the points.
 boolean canopyCovers(Canopy canopy, Vector point)
          Return if the point is covered by the canopy
 void config(DistanceMeasure aMeasure, double aT1, double aT2)
          Configure the Canopy for unit tests
 void configure(org.apache.hadoop.conf.Configuration configuration)
          Configure the Canopy and its distance measure
static List<Canopy> createCanopies(List<Vector> points, DistanceMeasure measure, double t1, double t2)
          Iterate through the points, adding new canopies.
static List<Vector> getCenters(Iterable<Canopy> canopies)
          Iterate through the canopies, adding their centroids to a list
 double getT1()
           
 double getT2()
           
 double getT3()
           
 double getT4()
           
static void updateCentroids(Iterable<Canopy> canopies)
          Iterate through the canopies, resetting their center to their centroids
 void useT3T4()
          Used by CanopyReducer to set t1=t3 and t2=t4 configuration values
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CanopyClusterer

public CanopyClusterer(DistanceMeasure measure,
                       double t1,
                       double t2)

CanopyClusterer

public CanopyClusterer(org.apache.hadoop.conf.Configuration config)
Method Detail

getT1

public double getT1()

getT2

public double getT2()

getT3

public double getT3()

getT4

public double getT4()

configure

public void configure(org.apache.hadoop.conf.Configuration configuration)
Configure the Canopy and its distance measure

Parameters:
configuration - the Configuration

useT3T4

public void useT3T4()
Used by CanopyReducer to set t1=t3 and t2=t4 configuration values


config

public void config(DistanceMeasure aMeasure,
                   double aT1,
                   double aT2)
Configure the Canopy for unit tests

Parameters:
aMeasure - the DistanceMeasure
aT1 - the T1 distance threshold
aT2 - the T2 distance threshold

addPointToCanopies

public void addPointToCanopies(Vector point,
                               Collection<Canopy> canopies)
This is the same algorithm as the reference but inverted to iterate over existing canopies instead of the points. Because of this it does not need to actually store the points, instead storing a total points vector and the number of points. From this a centroid can be computed.

This method is used by the CanopyMapper, CanopyReducer and CanopyDriver.

Parameters:
point - the point to be added
canopies - the List to be appended

canopyCovers

public boolean canopyCovers(Canopy canopy,
                            Vector point)
Return if the point is covered by the canopy

Parameters:
point - a point
Returns:
if the point is covered

createCanopies

public static List<Canopy> createCanopies(List<Vector> points,
                                          DistanceMeasure measure,
                                          double t1,
                                          double t2)
Iterate through the points, adding new canopies. Return the canopies.

Parameters:
points - a list defining the points to be clustered
measure - a DistanceMeasure to use
t1 - the T1 distance threshold
t2 - the T2 distance threshold
Returns:
the List created

getCenters

public static List<Vector> getCenters(Iterable<Canopy> canopies)
Iterate through the canopies, adding their centroids to a list

Parameters:
canopies - a List
Returns:
the List

updateCentroids

public static void updateCentroids(Iterable<Canopy> canopies)
Iterate through the canopies, resetting their center to their centroids

Parameters:
canopies - a List


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.