org.apache.mahout.vectorizer.encoders
Class StaticWordValueEncoder
java.lang.Object
org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
org.apache.mahout.vectorizer.encoders.WordValueEncoder
org.apache.mahout.vectorizer.encoders.StaticWordValueEncoder
- Direct Known Subclasses:
- CachingStaticWordValueEncoder
public class StaticWordValueEncoder
- extends WordValueEncoder
Encodes a categorical values with an unbounded vocabulary. Values are encoding by incrementing a
few locations in the output vector with a weight that is either defaulted to 1 or that is looked
up in a weight dictionary. By default, only one probe is used which should be fine but could
cause a decrease in the speed of learning because more features will be non-zero. If a large
feature vector is used so that the probability of feature collisions is suitably small, then this
can be decreased to 1. If a very small feature vector is used, the number of probes should
probably be increased to 3.
Method Summary |
protected int |
hashForProbe(byte[] originalForm,
int dataSize,
String name,
int probe)
Provides the unique hash for a particular probe. |
void |
setDictionary(Map<String,Double> dictionary)
Sets the weighting dictionary to be used by this encoder. |
void |
setMissingValueWeight(double missingValueWeight)
Sets the weight that is to be used for values that do not appear in the dictionary. |
protected double |
weight(byte[] originalForm)
|
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder |
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
StaticWordValueEncoder
public StaticWordValueEncoder(String name)
hashForProbe
protected int hashForProbe(byte[] originalForm,
int dataSize,
String name,
int probe)
- Description copied from class:
FeatureVectorEncoder
- Provides the unique hash for a particular probe. For all encoders except text, this
is all that is needed and the default implementation of hashesForProbe will do the right
thing. For text and similar values, hashesForProbe should be over-ridden and this method
should not be used.
- Overrides:
hashForProbe
in class WordValueEncoder
- Parameters:
originalForm
- The original byte array valuedataSize
- The length of the vector being encodedname
- The name of the variable being encodedprobe
- The probe number
- Returns:
- The hash of the current probe
setDictionary
public void setDictionary(Map<String,Double> dictionary)
- Sets the weighting dictionary to be used by this encoder. Also sets the missing value weight
to be half the smallest weight in the dictionary.
- Parameters:
dictionary
- The dictionary to use to look up weights.
setMissingValueWeight
public void setMissingValueWeight(double missingValueWeight)
- Sets the weight that is to be used for values that do not appear in the dictionary.
- Parameters:
missingValueWeight
- The default weight for missing values.
weight
protected double weight(byte[] originalForm)
- Specified by:
weight
in class WordValueEncoder
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.