org.apache.mahout.utils.email
Class MailProcessor

java.lang.Object
  extended by org.apache.mahout.utils.email.MailProcessor

public class MailProcessor
extends Object

Converts an mbox mail archive into a group of Hadoop Sequence Files with equal size. The archive may optionally be gzipped or zipped. @see org.apache.mahout.text.SequenceFilesFromMailArchives


Field Summary
static Pattern FROM_PREFIX
           
static Pattern REFS_PREFIX
           
static Pattern SUBJECT_PREFIX
           
static Pattern TO_PREFIX
           
 
Constructor Summary
MailProcessor(MailOptions options, String prefix, ChunkedWriter writer)
          This is the main constructor of MailProcessor.
MailProcessor(MailOptions options, String prefix, Writer writer)
          Creates a MailProcessor that does not write to sequence files, but to a single text file.
 
Method Summary
protected static String generateKey(File mboxFile, String prefix, String messageId)
           
 MailOptions getOptions()
           
 String getPrefix()
           
 long parseMboxLineByLine(File mboxFile)
          Parses one complete mail archive, writing output to the writer constructor parameter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SUBJECT_PREFIX

public static final Pattern SUBJECT_PREFIX

FROM_PREFIX

public static final Pattern FROM_PREFIX

REFS_PREFIX

public static final Pattern REFS_PREFIX

TO_PREFIX

public static final Pattern TO_PREFIX
Constructor Detail

MailProcessor

public MailProcessor(MailOptions options,
                     String prefix,
                     Writer writer)
Creates a MailProcessor that does not write to sequence files, but to a single text file. This constructor is for debugging and testing purposes.


MailProcessor

public MailProcessor(MailOptions options,
                     String prefix,
                     ChunkedWriter writer)
This is the main constructor of MailProcessor.

Method Detail

parseMboxLineByLine

public long parseMboxLineByLine(File mboxFile)
                         throws IOException
Parses one complete mail archive, writing output to the writer constructor parameter.

Parameters:
mboxFile - mail archive to parse
Returns:
number of parsed mails
Throws:
IOException

generateKey

protected static String generateKey(File mboxFile,
                                    String prefix,
                                    String messageId)

getPrefix

public String getPrefix()

getOptions

public MailOptions getOptions()


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.