org.apache.mahout.vectorizer.collocations.llr
Class LLRReducer

java.lang.Object
  extended by org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
      extended by org.apache.mahout.vectorizer.collocations.llr.LLRReducer

public class LLRReducer
extends org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>

Reducer for pass 2 of the collocation discovery job. Collects ngram and sub-ngram frequencies and performs the Log-likelihood ratio calculation.


Nested Class Summary
static class LLRReducer.ConcreteLLCallback
          concrete implementation delegates to LogLikelihood class
static interface LLRReducer.LLCallback
          provide interface so the input to the llr calculation can be captured for validation in unit testing
static class LLRReducer.Skipped
          Counter to track why a particlar entry was skipped
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Reducer.Context
 
Field Summary
static float DEFAULT_MIN_LLR
           
static String MIN_LLR
           
static String NGRAM_TOTAL
           
 
Constructor Summary
LLRReducer()
           
 
Method Summary
protected  void reduce(Gram ngram, Iterable<Gram> values, org.apache.hadoop.mapreduce.Reducer.Context context)
          Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below.
protected  void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
           
 
Methods inherited from class org.apache.hadoop.mapreduce.Reducer
cleanup, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NGRAM_TOTAL

public static final String NGRAM_TOTAL
See Also:
Constant Field Values

MIN_LLR

public static final String MIN_LLR
See Also:
Constant Field Values

DEFAULT_MIN_LLR

public static final float DEFAULT_MIN_LLR
See Also:
Constant Field Values
Constructor Detail

LLRReducer

public LLRReducer()
Method Detail

reduce

protected void reduce(Gram ngram,
                      Iterable<Gram> values,
                      org.apache.hadoop.mapreduce.Reducer.Context context)
               throws IOException,
                      InterruptedException
Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below. A+ B: number of times a+b appear together: ngramFreq A+!B: number of times A appears without B: hSubgramFreq - ngramFreq !A+ B: number of times B appears without A: tSubgramFreq - ngramFreq !A+!B: number of times neither A or B appears (in that order): N - (subgramFreqA + subgramFreqB - ngramFreq)

Overrides:
reduce in class org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
Throws:
IOException
InterruptedException

setup

protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
              throws IOException,
                     InterruptedException
Overrides:
setup in class org.apache.hadoop.mapreduce.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
Throws:
IOException
InterruptedException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.