org.apache.mahout.clustering.streaming.mapreduce
Class StreamingKMeansUtilsMR

java.lang.Object
  extended by org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansUtilsMR

public final class StreamingKMeansUtilsMR
extends Object


Method Summary
static Iterable<Centroid> castVectorsToCentroids(Iterable<Vector> input)
          Returns an Iterable of Centroid from an Iterable of Vector by either casting each Vector to Centroid (if the instance extends Centroid) or create a new Centroid based on that Vector.
static Iterable<Centroid> getCentroidsFromVectorWritable(Iterable<VectorWritable> inputIterable)
          Returns an Iterable of centroids from an Iterable of VectorWritables by creating a new Centroid containing a RandomAccessSparseVector as a delegate for each VectorWritable.
static UpdatableSearcher searcherFromConfiguration(org.apache.hadoop.conf.Configuration conf)
          Instantiates a searcher from a given configuration.
static void writeCentroidsToSequenceFile(Iterable<Centroid> centroids, org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
          Writes centroids to a sequence file.
static void writeVectorsToSequenceFile(Iterable<? extends Vector> datapoints, org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

searcherFromConfiguration

public static UpdatableSearcher searcherFromConfiguration(org.apache.hadoop.conf.Configuration conf)
Instantiates a searcher from a given configuration.

Parameters:
conf - the configuration
Returns:
the instantiated searcher
Throws:
RuntimeException - if the distance measure class cannot be instantiated
IllegalStateException - if an unknown searcher class was requested

getCentroidsFromVectorWritable

public static Iterable<Centroid> getCentroidsFromVectorWritable(Iterable<VectorWritable> inputIterable)
Returns an Iterable of centroids from an Iterable of VectorWritables by creating a new Centroid containing a RandomAccessSparseVector as a delegate for each VectorWritable.

Parameters:
inputIterable - VectorWritable Iterable to get Centroids from
Returns:
the new Centroids

castVectorsToCentroids

public static Iterable<Centroid> castVectorsToCentroids(Iterable<Vector> input)
Returns an Iterable of Centroid from an Iterable of Vector by either casting each Vector to Centroid (if the instance extends Centroid) or create a new Centroid based on that Vector. The implicit expectation is that the input will not have interleaving types of vectors. Otherwise, the numbering of new Centroids will become invalid.

Parameters:
input - Iterable of Vectors to cast
Returns:
the new Centroids

writeCentroidsToSequenceFile

public static void writeCentroidsToSequenceFile(Iterable<Centroid> centroids,
                                                org.apache.hadoop.fs.Path path,
                                                org.apache.hadoop.conf.Configuration conf)
                                         throws IOException
Writes centroids to a sequence file.

Parameters:
centroids - the centroids to write.
path - the path of the output file.
conf - the configuration for the HDFS to write the file to.
Throws:
IOException

writeVectorsToSequenceFile

public static void writeVectorsToSequenceFile(Iterable<? extends Vector> datapoints,
                                              org.apache.hadoop.fs.Path path,
                                              org.apache.hadoop.conf.Configuration conf)
                                       throws IOException
Throws:
IOException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.