, and distributed operations are executed as M/R passes on
Hadoop. The usage is as follows:
// the path must already contain an already created SequenceFile!
DistributedRowMatrix m = new DistributedRowMatrix("path/to/vector/sequenceFile", "tmp/path", 10000000, 250000);
m.setConf(new Configuration());
// now if we want to multiply a vector by this matrix, it's dimension must equal the row dimension of this
// matrix. If we want to timesSquared() a vector by this matrix, its dimension must equal the column dimension
// of the matrix.
Vector v = new DenseVector(250000);
// now the following operation will be done via a M/R pass via Hadoop.
Vector w = m.timesSquared(v);
Constructor Summary |
DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols)
|
DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean keepTempFiles)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
KEEP_TEMP_FILES
public static final String KEEP_TEMP_FILES
- See Also:
- Constant Field Values
DistributedRowMatrix
public DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols)
DistributedRowMatrix
public DistributedRowMatrix(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean keepTempFiles)
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf
in interface org.apache.hadoop.conf.Configurable
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
setConf
in interface org.apache.hadoop.conf.Configurable
getRowPath
public org.apache.hadoop.fs.Path getRowPath()
getOutputTempPath
public org.apache.hadoop.fs.Path getOutputTempPath()
setOutputTempPathString
public void setOutputTempPathString(String outPathString)
iterateAll
public Iterator<MatrixSlice> iterateAll()
- Specified by:
iterateAll
in interface VectorIterable
numSlices
public int numSlices()
- Specified by:
numSlices
in interface VectorIterable
numRows
public int numRows()
- Specified by:
numRows
in interface VectorIterable
numCols
public int numCols()
- Specified by:
numCols
in interface VectorIterable
times
public DistributedRowMatrix times(DistributedRowMatrix other)
throws IOException
- This implements matrix this.transpose().times(other)
- Parameters:
other
- a DistributedRowMatrix
- Returns:
- a DistributedRowMatrix containing the product
- Throws:
IOException
times
public DistributedRowMatrix times(DistributedRowMatrix other,
org.apache.hadoop.fs.Path outPath)
throws IOException
- This implements matrix this.transpose().times(other)
- Parameters:
other
- a DistributedRowMatrixoutPath
- path to write result to
- Returns:
- a DistributedRowMatrix containing the product
- Throws:
IOException
columnMeans
public Vector columnMeans()
throws IOException
- Throws:
IOException
columnMeans
public Vector columnMeans(String vectorClass)
throws IOException
- Returns the column-wise mean of a DistributedRowMatrix
- Parameters:
vectorClass
- desired class for the column-wise mean vector e.g.
RandomAccessSparseVector, DenseVector
- Returns:
- Vector containing the column-wise mean of this
- Throws:
IOException
transpose
public DistributedRowMatrix transpose()
throws IOException
- Throws:
IOException
times
public Vector times(Vector v)
- Specified by:
times
in interface VectorIterable
timesSquared
public Vector timesSquared(Vector v)
- Specified by:
timesSquared
in interface VectorIterable
iterator
public Iterator<MatrixSlice> iterator()
- Specified by:
iterator
in interface Iterable<MatrixSlice>
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.