org.apache.mahout.utils.vectors
Class RowIdJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.utils.vectors.RowIdJob
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class RowIdJob
- extends AbstractJob
Converts a vector representation of documents into a document x terms
matrix.
The input data is in SequenceFile<Text,VectorWritable>
format (as generated by
SparseVectorsFromSequenceFiles
or by EncodedVectorsFromSequenceFiles
)
and generates the following two files as output:
- A file called "matrix" of format
SequenceFile<IntWritable,VectorWritable>
.
- A file called "docIndex" of format
SequenceFile<IntWritable,Text>
.
The input file can be regenerated by joining the two output files on the generated int key.
In other words, RowIdJob
replaces the document text ids by integers.
The original document text ids can still be retrieved from the "docIndex".
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RowIdJob
public RowIdJob()
run
public int run(String[] args)
throws Exception
- Throws:
Exception
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.