|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.classifier.df.mapreduce.Builder
public abstract class Builder
Base class for Mapred DecisionForest builders. Takes care of storing the parameters common to the mapred
implementations.
The child classes must implement at least :
Constructor Summary | |
---|---|
protected |
Builder(TreeBuilder treeBuilder,
org.apache.hadoop.fs.Path dataPath,
org.apache.hadoop.fs.Path datasetPath,
Long seed,
org.apache.hadoop.conf.Configuration conf)
|
Method Summary | |
---|---|
DecisionForest |
build(int nbTrees)
|
protected abstract void |
configureJob(org.apache.hadoop.mapreduce.Job job)
Used by the inheriting classes to configure the job |
protected org.apache.hadoop.fs.Path |
getDataPath()
|
static org.apache.hadoop.fs.Path |
getDistributedCacheFile(org.apache.hadoop.conf.Configuration conf,
int index)
Helper method. |
static int |
getNbTrees(org.apache.hadoop.conf.Configuration conf)
Get the number of trees for the map-reduce job. |
static int |
getNumMaps(org.apache.hadoop.conf.Configuration conf)
Return the value of "mapred.map.tasks". |
protected org.apache.hadoop.fs.Path |
getOutputPath(org.apache.hadoop.conf.Configuration conf)
Output Directory name |
static Long |
getRandomSeed(org.apache.hadoop.conf.Configuration conf)
Returns the random seed |
static TreeBuilder |
getTreeBuilder(org.apache.hadoop.conf.Configuration conf)
|
protected static boolean |
isOutput(org.apache.hadoop.conf.Configuration conf)
Used only for DEBUG purposes. |
static Dataset |
loadDataset(org.apache.hadoop.conf.Configuration conf)
Helper method. |
protected abstract DecisionForest |
parseOutput(org.apache.hadoop.mapreduce.Job job)
Parse the output files to extract the trees and pass the predictions to the callback |
protected boolean |
runJob(org.apache.hadoop.mapreduce.Job job)
Sequential implementation should override this method to simulate the job execution |
static void |
setNbTrees(org.apache.hadoop.conf.Configuration conf,
int nbTrees)
Set the number of trees to grow for the map-reduce job |
void |
setOutputDirName(String name)
Sets the Output directory name, will be creating in the working directory |
static void |
sortSplits(org.apache.hadoop.mapreduce.InputSplit[] splits)
sort the splits into order based on size, so that the biggest go first. This is the same code used by Hadoop's JobClient. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected Builder(TreeBuilder treeBuilder, org.apache.hadoop.fs.Path dataPath, org.apache.hadoop.fs.Path datasetPath, Long seed, org.apache.hadoop.conf.Configuration conf)
Method Detail |
---|
protected org.apache.hadoop.fs.Path getDataPath()
public static int getNumMaps(org.apache.hadoop.conf.Configuration conf)
conf
- configuration
protected static boolean isOutput(org.apache.hadoop.conf.Configuration conf)
conf
- configuration
public static Long getRandomSeed(org.apache.hadoop.conf.Configuration conf)
conf
- configuration
public static TreeBuilder getTreeBuilder(org.apache.hadoop.conf.Configuration conf)
public static int getNbTrees(org.apache.hadoop.conf.Configuration conf)
conf
- configuration
public static void setNbTrees(org.apache.hadoop.conf.Configuration conf, int nbTrees)
conf
- configurationnbTrees
- number of trees to build
IllegalArgumentException
- if (nbTrees <= 0)public void setOutputDirName(String name)
name
- output dir. nameprotected org.apache.hadoop.fs.Path getOutputPath(org.apache.hadoop.conf.Configuration conf) throws IOException
conf
- configuration
IOException
- if we cannot get the default FileSystempublic static org.apache.hadoop.fs.Path getDistributedCacheFile(org.apache.hadoop.conf.Configuration conf, int index) throws IOException
conf
- configurationindex
- index of the path in the DistributedCache files
IOException
- if no path is foundpublic static Dataset loadDataset(org.apache.hadoop.conf.Configuration conf) throws IOException
conf
- configuration
IOException
- if we cannot retrieve the Dataset path from the DistributedCache, or the Dataset could not be
loadedprotected abstract void configureJob(org.apache.hadoop.mapreduce.Job job) throws IOException
job
- Hadoop's Job
IOException
- if anything goes wrong while configuring the jobprotected boolean runJob(org.apache.hadoop.mapreduce.Job job) throws ClassNotFoundException, IOException, InterruptedException
job
- Hadoop's job
ClassNotFoundException
IOException
InterruptedException
protected abstract DecisionForest parseOutput(org.apache.hadoop.mapreduce.Job job) throws IOException
job
- Hadoop's job
IOException
- if anything goes wrong while parsing the outputpublic DecisionForest build(int nbTrees) throws IOException, ClassNotFoundException, InterruptedException
IOException
ClassNotFoundException
InterruptedException
public static void sortSplits(org.apache.hadoop.mapreduce.InputSplit[] splits)
splits
- input splits
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |