org.apache.mahout.vectorizer.encoders
Class LuceneTextValueEncoder

java.lang.Object
  extended by org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
      extended by org.apache.mahout.vectorizer.encoders.TextValueEncoder
          extended by org.apache.mahout.vectorizer.encoders.LuceneTextValueEncoder

public class LuceneTextValueEncoder
extends TextValueEncoder

Encodes text using a lucene style tokenizer.

See Also:
TextValueEncoder

Field Summary
 
Fields inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
CONTINUOUS_VALUE_HASH_SEED, WORD_LIKE_VALUE_HASH_SEED
 
Constructor Summary
LuceneTextValueEncoder(String name)
           
 
Method Summary
 void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
           
protected  Iterable<String> tokenize(CharSequence originalForm)
          Tokenizes a string using the simplest method.
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.TextValueEncoder
addText, addText, addToVector, asString, flush, hashesForProbe, hashForProbe, setWordEncoder
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, getWeight, hash, hash, hash, hash, hash, isTraceEnabled, setProbes, setTraceDictionary, trace, trace
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LuceneTextValueEncoder

public LuceneTextValueEncoder(String name)
Method Detail

setAnalyzer

public void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)

tokenize

protected Iterable<String> tokenize(CharSequence originalForm)
Tokenizes a string using the simplest method. This should be over-ridden for more subtle tokenization.

Overrides:
tokenize in class TextValueEncoder
See Also:
LuceneTextValueEncoder


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.