org.apache.mahout.text
Class WikipediaToSequenceFile

java.lang.Object
  extended by org.apache.mahout.text.WikipediaToSequenceFile

public final class WikipediaToSequenceFile
extends Object

Create and run the Wikipedia Dataset Creator.


Method Summary
static void main(String[] args)
          Takes in two arguments: The input Path where the input documents live The output Path where to write the classifier as a SequenceFile
static void runJob(String input, String output, String catFile, boolean exactMatchOnly, boolean all)
          Run the job
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
                 throws IOException
Takes in two arguments:
  1. The input Path where the input documents live
  2. The output Path where to write the classifier as a SequenceFile

Throws:
IOException

runJob

public static void runJob(String input,
                          String output,
                          String catFile,
                          boolean exactMatchOnly,
                          boolean all)
                   throws IOException,
                          InterruptedException,
                          ClassNotFoundException
Run the job

Parameters:
input - the input pathname String
output - the output pathname String
catFile - the file containing the Wikipedia categories
exactMatchOnly - if true, then the Wikipedia category must match exactly instead of simply containing the category string
all - if true select all categories
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.