How do I use Avro to process a stream that I cannot seek?
I am using Avro 1.4.0 to read some data out of S3 via the Python avro bindings and the boto S3 library. When I open an avro.datafile.DataFileReader on the file like objects returned by boto it immediately fails when it tries to seek(). For now I am working around this by reading the S3 objects into temporary files.
I would like to be able to stream through any python object that supports read(). Can anybody provide advice?
I am not very clear on this and this may not be the answer. I was of the impression that
diter = datafile.DataFileReader(..)
returns an iterator so that you could do the following
for data in diter: ....
Correct me, if I am wrong here.
Revisiting my answer:
You are right, datafile.DataFileReader does not play well with a reader for which seek would fail.
it uses avro.io.BinaryDecoder which accepts a reader.
class BinaryDecoder(object): """Read leaf values.""" def __init__(self, reader): """ reader is a Python object on which we can call read, seek, and tell. """ self._reader = reader
What you can do is create your own reader class that does provide these functions - read , seek and tell but internally utilizes boto S3 library to read of data.