org.apache.mahout.math.hadoop.decomposer
Class EigenVerificationJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class EigenVerificationJob
- extends AbstractJob
Class for taking the output of an eigendecomposition (specified as a Path location), and verifies correctness, in
terms of the following: if you have a vector e, and a matrix m, then let e' = m.timesSquared(v); the error w.r.t.
eigenvector-ness is the cosine of the angle between e and e':
error(e,e') = e.dot(e') / (e.norm(2)*e'.norm(2))
A set of eigenvectors should also all be very close to orthogonal, so this job computes all inner products between
eigenvectors, and checks that this is close to the identity matrix.
Parameters used in the cleanup (other than in the input/output path options) include --minEigenvalue, which specifies
the value below which eigenvector/eigenvalue pairs will be discarded, and --maxError, which specifies the maximum
error (as defined above) to be tolerated in an eigenvector.
If all the eigenvectors can fit in memory, --inMemory allows for a speedier completion of this task by doing so.
Method Summary |
org.apache.hadoop.fs.Path |
getCleanedEigensPath()
|
static void |
main(String[] args)
|
int |
run(org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.fs.Path tempOut,
double maxError,
double minEigenValue,
boolean inMemory,
org.apache.hadoop.conf.Configuration conf)
Run the job with the given arguments |
int |
run(String[] args)
|
void |
runJob(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path output,
boolean inMemory,
double maxError,
int maxEigens)
Progammatic invocation of run() |
void |
setEigensToVerify(VectorIterable eigens)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CLEAN_EIGENVECTORS
public static final String CLEAN_EIGENVECTORS
- See Also:
- Constant Field Values
EigenVerificationJob
public EigenVerificationJob()
setEigensToVerify
public void setEigensToVerify(VectorIterable eigens)
run
public int run(String[] args)
throws Exception
- Throws:
Exception
run
public int run(org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.fs.Path tempOut,
double maxError,
double minEigenValue,
boolean inMemory,
org.apache.hadoop.conf.Configuration conf)
throws IOException
- Run the job with the given arguments
- Parameters:
corpusInput
- the corpus input PatheigenInput
- the eigenvector input Pathoutput
- the output PathtempOut
- temporary output PathmaxError
- a double representing the maximum errorminEigenValue
- a double representing the minimum eigenvalueinMemory
- a boolean requesting in-memory preparationconf
- the Configuration to use, or null if a default is ok (saves referencing Configuration in calling classes
unless needed)
- Throws:
IOException
getCleanedEigensPath
public org.apache.hadoop.fs.Path getCleanedEigensPath()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
runJob
public void runJob(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path output,
boolean inMemory,
double maxError,
int maxEigens)
throws IOException
- Progammatic invocation of run()
- Parameters:
eigenInput
- Output of LanczosSolvercorpusInput
- Input of LanczosSolver
- Throws:
IOException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.