How do I use Avro to process a stream that I cannot seek?

I am using Avro 1.4.0 to read some data out of S3 via the Python avro bindings and the boto S3 library. When I open an avro.datafile.DataFileReader on the file like objects returned by boto it immediately fails when it tries to seek(). For now I am working around this by reading the S3 objects into temporary files.

I would like to be able to stream through any python object that supports read(). Can anybody provide advice?

Answers


I am not very clear on this and this may not be the answer. I was of the impression that

diter = datafile.DataFileReader(..) 

returns an iterator so that you could do the following

for data in diter:
    ....

Correct me, if I am wrong here.

Revisiting my answer:

You are right, datafile.DataFileReader does not play well with a reader for which seek would fail.

it uses avro.io.BinaryDecoder which accepts a reader.

class BinaryDecoder(object):
    """Read leaf values."""
    def __init__(self, reader):
        """
    reader is a Python object on which we can call read, seek, and tell.
    """
    self._reader = reader

What you can do is create your own reader class that does provide these functions - read , seek and tell but internally utilizes boto S3 library to read of data.


Need Your Help

How to get a form input using Tkinter in Pygame?

python tkinter pygame tk

I'm writing a game using Pygame, and parts of the game require data input that Tk seems to be good at.

Listening for Multiple Events on Multiple Objects AS3

flash actionscript-3 events function event-listener

I currently have an over, out and click event for 8 different objects. The over and out events are identical for each (tween expands object and then shrinks it back for the out state).