org.apache.mahout.vectorizer.encoders
Class StaticWordValueEncoder

java.lang.Object
  extended by org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
      extended by org.apache.mahout.vectorizer.encoders.WordValueEncoder
          extended by org.apache.mahout.vectorizer.encoders.StaticWordValueEncoder
Direct Known Subclasses:
CachingStaticWordValueEncoder

public class StaticWordValueEncoder
extends WordValueEncoder

Encodes a categorical values with an unbounded vocabulary. Values are encoding by incrementing a few locations in the output vector with a weight that is either defaulted to 1 or that is looked up in a weight dictionary. By default, only one probe is used which should be fine but could cause a decrease in the speed of learning because more features will be non-zero. If a large feature vector is used so that the probability of feature collisions is suitably small, then this can be decreased to 1. If a very small feature vector is used, the number of probes should probably be increased to 3.


Field Summary
 
Fields inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
CONTINUOUS_VALUE_HASH_SEED, WORD_LIKE_VALUE_HASH_SEED
 
Constructor Summary
StaticWordValueEncoder(String name)
           
 
Method Summary
protected  int hashForProbe(byte[] originalForm, int dataSize, String name, int probe)
          Provides the unique hash for a particular probe.
 void setDictionary(Map<String,Double> dictionary)
          Sets the weighting dictionary to be used by this encoder.
 void setMissingValueWeight(double missingValueWeight)
          Sets the weight that is to be used for values that do not appear in the dictionary.
protected  double weight(byte[] originalForm)
           
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.WordValueEncoder
addToVector, asString, getWeight
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StaticWordValueEncoder

public StaticWordValueEncoder(String name)
Method Detail

hashForProbe

protected int hashForProbe(byte[] originalForm,
                           int dataSize,
                           String name,
                           int probe)
Description copied from class: FeatureVectorEncoder
Provides the unique hash for a particular probe. For all encoders except text, this is all that is needed and the default implementation of hashesForProbe will do the right thing. For text and similar values, hashesForProbe should be over-ridden and this method should not be used.

Overrides:
hashForProbe in class WordValueEncoder
Parameters:
originalForm - The original byte array value
dataSize - The length of the vector being encoded
name - The name of the variable being encoded
probe - The probe number
Returns:
The hash of the current probe

setDictionary

public void setDictionary(Map<String,Double> dictionary)
Sets the weighting dictionary to be used by this encoder. Also sets the missing value weight to be half the smallest weight in the dictionary.

Parameters:
dictionary - The dictionary to use to look up weights.

setMissingValueWeight

public void setMissingValueWeight(double missingValueWeight)
Sets the weight that is to be used for values that do not appear in the dictionary.

Parameters:
missingValueWeight - The default weight for missing values.

weight

protected double weight(byte[] originalForm)
Specified by:
weight in class WordValueEncoder


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.