Package org.apache.mahout.classifier.df.mapreduce.inmem

In-memory mapreduce implementation of Random Decision Forests

See:
          Description

Class Summary
InMemBuilder MapReduce implementation where each mapper loads a full copy of the data in-memory.
InMemInputFormat Custom InputFormat that generates InputSplits given the desired number of trees.
each input split contains a subset of the trees.
The number of splits is equal to the number of requested splits
InMemInputFormat.InMemInputSplit Custom InputSplit that indicates how many trees are built by each mapper
InMemInputFormat.InMemRecordReader  
InMemMapper In-memory mapper that grows the trees using a full copy of the data loaded in-memory.
 

Package org.apache.mahout.classifier.df.mapreduce.inmem Description

In-memory mapreduce implementation of Random Decision Forests

Each mapper is responsible for growing a number of trees with a whole copy of the dataset loaded in memory, it uses the reference implementation's code to build each tree and estimate the oob error.

The dataset is distributed to the slave nodes using the DistributedCache. A custom InputFormat (InMemInputFormat) is configured with the desired number of trees and generates a number of InputSplits equal to the configured number of maps.

There is no need for reducers, each map outputs (the trees it built and, for each tree, the labels the tree predicted for each out-of-bag instance. This step has to be done in the mapper because only there we know which instances are o-o-b.

The Forest builder (InMemBuilder) is responsible for configuring and launching the job. At the end of the job it parses the output files and builds the corresponding DecisionForest.



Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.