Why can I only read 1024 bytes at a time with ObjectInputStream?

I have written the following code which writes 4000 bytes of 0s to a file test.txt. Then, I read the same file in chunks of 1000 bytes at a time.

FileOutputStream output = new FileOutputStream("test.txt");
ObjectOutputStream stream = new ObjectOutputStream(output);

byte[] bytes = new byte[4000];

stream.write(bytes);
stream.close();

FileInputStream input = new FileInputStream("test.txt");
ObjectInputStream s = new ObjectInputStream(input);


byte[] buffer = new byte[1000];
int read = s.read(buffer);

while (read > 0) {
    System.out.println("Read " + read);
    read = s.read(buffer);
}

s.close();

What I expect to happen is to read 1000 bytes four times.

Read 1000
Read 1000
Read 1000
Read 1000

However, what actually happens is that I seem to get "paused" (for a lack of a better word) every 1024 bytes.

Read 1000
Read 24
Read 1000
Read 24
Read 1000
Read 24
Read 928

If I try to read more than 1024 bytes, then I get capped at 1024 bytes. If I try to read less than 1024 bytes, I'm still required to pause at the 1024 byte mark.

Upon inspection of the output file test.txt in hexadecimal, I noticed that there is a sequence of 5 non-zero bytes 7A 00 00 04 00 1029 bytes apart, despite the fact that I have written only 0s to the file. Here is the output from my hex editor. (Would be too long to fit in question.)

So my question is : Why are these five bytes appearing in my file when I have written entirely 0s? Do these 5 bytes have something to do with the pause that occurs every 1024 bytes? Why is this necessary?

Answers


The object streams use an internal 1024-byte buffer, and write primitive data in chunks of that size, in blocks of the stream headed by Block Data markers, which are, guess what, 0x7A followed by a 32-bit length word (or 0x77 followed by an 8-bit length word). So you can only ever read a maximum of 1024 bytes.

The real question here is why you're using object streams just to read and write bytes. Use buffered streams. Then the buffering is under your control, and incidentally there's zero space overhead, unlike the object streams which have stream headers and type codes.

NB serialized data is not text and shouldn't be stored in files named .txt.


ObjectOutputStream and ObjectInputStream are special streams used for serialization of objects.

But when you do stream.write(bytes); you are trying to use the ObjectOutputStream as a regular stream, for writing 4000 bytes, not for writing an array-of-bytes object. When data are written like this to an ObjectOutputStream they are handled specially.

From the documentation of ObjectOutputStream:

(emphasis mine.)

Primitive data, excluding serializable fields and externalizable data, is written to the ObjectOutputStream in block-data records. A block data record is composed of a header and data. The block data header consists of a marker and the number of bytes to follow the header. Consecutive primitive data writes are merged into one block-data record. The blocking factor used for a block-data record will be 1024 bytes. Each block-data record will be filled up to 1024 bytes, or be written whenever there is a termination of block-data mode.

I hope from this it is obvious why you are receiving this behaviour.

I would recommend that you either use BufferedOutputStream instead of ObjectOutputStream, or, if you really want to use ObjectOutputStream, then use writeObject() instead of write(). The corresponding applies to input.


I suggest you use a try-with-resources Statement to handle closing your resources, add buffering with BufferedInputStream and BufferedOutputStream, and then use writeObject and readObject to serialize your byte[]. Something like,

try (OutputStream output = new BufferedOutputStream(//
        new FileOutputStream("test.txt"), 8192); //
        ObjectOutputStream stream = new ObjectOutputStream(output)) {
    byte[] bytes = new byte[4000];

    stream.writeObject(bytes);
} catch (IOException ioe) {
    ioe.printStackTrace();
}

and then to read like

try (InputStream input = new BufferedInputStream(//
        new FileInputStream("test.txt"), 8192); //
        ObjectInputStream s = new ObjectInputStream(input)) {
    byte[] bytes = (byte[]) s.readObject();
} catch (IOException | ClassNotFoundException ioe) {
    ioe.printStackTrace();
}

If there are partial arrays involved, you'll need to add the length. You can use stream.writeInt(len); and int len = stream.readInt(); on the other side.


Need Your Help

Gradle subproject name different than folder name

gradle folder subproject

I have a couple of subprojects that are part of a multi-project build (flat hierarchy). I want to set the name on them to be different than their folder name. However in include (setting.gradle) it...