org.apache.mahout.clustering.canopy
Class CanopyDriver
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.canopy.CanopyDriver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class CanopyDriver
- extends AbstractJob
Method Summary |
static org.apache.hadoop.fs.Path |
buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double t3,
double t4,
int clusterFilter,
boolean runSequential)
Build a directory of Canopy clusters from the input vectors and other
arguments. |
static org.apache.hadoop.fs.Path |
buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
int clusterFilter,
boolean runSequential)
Convenience method for backwards compatibility |
static void |
main(String[] args)
|
static void |
run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Convenience method to provide backward compatibility |
static void |
run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double t3,
double t4,
int clusterFilter,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Build a directory of Canopy clusters from the input arguments and, if
requested, cluster the input vectors using these clusters |
static void |
run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
Convenience method creates new Configuration() Build a directory of Canopy
clusters from the input arguments and, if requested, cluster the input
vectors using these clusters |
int |
run(String[] args)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_CLUSTERED_POINTS_DIRECTORY
public static final String DEFAULT_CLUSTERED_POINTS_DIRECTORY
- See Also:
- Constant Field Values
CanopyDriver
public CanopyDriver()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Throws:
Exception
run
public static void run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double t3,
double t4,
int clusterFilter,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Build a directory of Canopy clusters from the input arguments and, if
requested, cluster the input vectors using these clusters
- Parameters:
conf
- the Configurationinput
- the Path to the directory containing input vectorsoutput
- the Path for all output directoriesmeasure
- the DistanceMeasuret1
- the double T1 distance metrict2
- the double T2 distance metrict3
- the reducer's double T1 distance metrict4
- the reducer's double T2 distance metricclusterFilter
- the minimum canopy size output by the mappersrunClustering
- cluster the input vectors if trueclusterClassificationThreshold
- vectors having pdf below this value will not be clustered. Its value should be between 0 and 1.runSequential
- execute sequentially if true
- Throws:
IOException
InterruptedException
ClassNotFoundException
run
public static void run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Convenience method to provide backward compatibility
- Throws:
IOException
InterruptedException
ClassNotFoundException
run
public static void run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
boolean runClustering,
double clusterClassificationThreshold,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Convenience method creates new Configuration() Build a directory of Canopy
clusters from the input arguments and, if requested, cluster the input
vectors using these clusters
- Parameters:
input
- the Path to the directory containing input vectorsoutput
- the Path for all output directoriest1
- the double T1 distance metrict2
- the double T2 distance metricrunClustering
- cluster the input vectors if trueclusterClassificationThreshold
- vectors having pdf below this value will not be clustered. Its value should be between 0 and 1.runSequential
- execute sequentially if true
- Throws:
IOException
InterruptedException
ClassNotFoundException
buildClusters
public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
int clusterFilter,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Convenience method for backwards compatibility
- Throws:
IOException
InterruptedException
ClassNotFoundException
buildClusters
public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double t3,
double t4,
int clusterFilter,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Build a directory of Canopy clusters from the input vectors and other
arguments. Run sequential or mapreduce execution as requested
- Parameters:
conf
- the Configuration to useinput
- the Path to the directory containing input vectorsoutput
- the Path for all output directoriesmeasure
- the DistanceMeasuret1
- the double T1 distance metrict2
- the double T2 distance metrict3
- the reducer's double T1 distance metrict4
- the reducer's double T2 distance metricclusterFilter
- the int minimum size of canopies producedrunSequential
- a boolean indicates to run the sequential (reference) algorithm
- Returns:
- the canopy output directory Path
- Throws:
IOException
InterruptedException
ClassNotFoundException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.