org.apache.mahout.text.wikipedia
Class XmlInputFormat.XmlRecordReader

java.lang.Object
  extended by org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
      extended by org.apache.mahout.text.wikipedia.XmlInputFormat.XmlRecordReader
All Implemented Interfaces:
Closeable
Enclosing class:
XmlInputFormat

public static class XmlInputFormat.XmlRecordReader
extends org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>

XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag


Constructor Summary
XmlInputFormat.XmlRecordReader(org.apache.hadoop.mapreduce.lib.input.FileSplit split, org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 void close()
           
 org.apache.hadoop.io.LongWritable getCurrentKey()
           
 org.apache.hadoop.io.Text getCurrentValue()
           
 float getProgress()
           
 void initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 boolean nextKeyValue()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XmlInputFormat.XmlRecordReader

public XmlInputFormat.XmlRecordReader(org.apache.hadoop.mapreduce.lib.input.FileSplit split,
                                      org.apache.hadoop.conf.Configuration conf)
                               throws IOException
Throws:
IOException
Method Detail

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Specified by:
close in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException

getProgress

public float getProgress()
                  throws IOException
Specified by:
getProgress in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException

getCurrentKey

public org.apache.hadoop.io.LongWritable getCurrentKey()
                                                throws IOException,
                                                       InterruptedException
Specified by:
getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

getCurrentValue

public org.apache.hadoop.io.Text getCurrentValue()
                                          throws IOException,
                                                 InterruptedException
Specified by:
getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

initialize

public void initialize(org.apache.hadoop.mapreduce.InputSplit split,
                       org.apache.hadoop.mapreduce.TaskAttemptContext context)
                throws IOException,
                       InterruptedException
Specified by:
initialize in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

nextKeyValue

public boolean nextKeyValue()
                     throws IOException,
                            InterruptedException
Specified by:
nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.