InputStream to Hadoop SequenceFile

I have a generic input stream that represents a sequence file. I would like to create a SequenceFile.Reader, or a similar class, from it with out needing to write the output stream to a temp file on disk. Is there something that would like me go from an input stream to something that would allow me to read the key/value pairs from the inputstream.

Answers


For some reason, there doesn't appear to be any documentation on the current SequenceFile.Reader class, but looking through the code for Hadoop 2.0+, I believe the following would work:

FSDataInputStream inputStream = new FSDataInputStream (rawInputStream);
Option isOption = SequenceFile.Reader.stream(inputStream);
SequenceFile.Reader reader = new SequenceFile.Reader(hadoopConf,isOption);

You may also be interested in the options start and length, where start is how many bytes to skip before reading the sequence file, and length is how many bytes to read from the stream. Setting up that reader might look like this:

FSDataInputStream inputStream = new FSDataInputStream (rawInputStream);
Option isOption = SequenceFile.Reader.stream(inputStream);
Option lengthOption = SequenceFile.Reader.length(100000);
Option startOption = SequenceFile.Reader.start(10);
SequenceFile.Reader reader = new SequenceFile.Reader(hadoopConf,isOption,lengthOption,startOption);

Finally, once you have constructed your Reader, you can read through your values as such:

Text key = new Text();
Text val = new Text();
while (reader.next(key,val)) {
   //do stuff
}

Again, I have not tested this code at all, but this should work in theory. If you are not using Hadoop 2.0+, I'm not sure what to tell you.


Need Your Help

Public Domain or No Binary Attribution FFT Lib?

licensing d fft numerics

I'm looking for an FFT library to translate into the D programming language for inclusion either in a library that I'm working on or (better yet) in the standard library. I need a fairly simple FF...

How to hold on to a Simple XML object Variable

php drupal

I am getting response from service as a simple xml object of form