org.apache.mahout.vectorizer.collocations.llr
Class LLRReducer
java.lang.Object
org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
org.apache.mahout.vectorizer.collocations.llr.LLRReducer
public class LLRReducer
- extends org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
Reducer for pass 2 of the collocation discovery job. Collects ngram and sub-ngram frequencies and performs
the Log-likelihood ratio calculation.
Nested Class Summary |
static class |
LLRReducer.ConcreteLLCallback
concrete implementation delegates to LogLikelihood class |
static interface |
LLRReducer.LLCallback
provide interface so the input to the llr calculation can be captured for validation in unit testing |
static class |
LLRReducer.Skipped
Counter to track why a particlar entry was skipped |
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer |
org.apache.hadoop.mapreduce.Reducer.Context |
Method Summary |
protected void |
reduce(Gram ngram,
Iterable<Gram> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total
Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below. |
protected void |
setup(org.apache.hadoop.mapreduce.Reducer.Context context)
|
Methods inherited from class org.apache.hadoop.mapreduce.Reducer |
cleanup, run |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
NGRAM_TOTAL
public static final String NGRAM_TOTAL
- See Also:
- Constant Field Values
MIN_LLR
public static final String MIN_LLR
- See Also:
- Constant Field Values
DEFAULT_MIN_LLR
public static final float DEFAULT_MIN_LLR
- See Also:
- Constant Field Values
LLRReducer
public LLRReducer()
reduce
protected void reduce(Gram ngram,
Iterable<Gram> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException,
InterruptedException
- Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total
Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below.
A+ B: number of times a+b appear together: ngramFreq A+!B: number of times A appears without B:
hSubgramFreq - ngramFreq !A+ B: number of times B appears without A: tSubgramFreq - ngramFreq !A+!B:
number of times neither A or B appears (in that order): N - (subgramFreqA + subgramFreqB - ngramFreq)
- Overrides:
reduce
in class org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
- Throws:
IOException
InterruptedException
setup
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException,
InterruptedException
- Overrides:
setup
in class org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
- Throws:
IOException
InterruptedException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.