Java OutOfMemoryError while merge large file parts from chunked files

I have a problem when the user upload large files (> 1 GB) (I'm using flow.js library), it creates hundred of thousand small chunked files (e.g 100KB each) inside temporary directory but failed to merge into single file, due to MemoryOutOfException. This is not happened when the file is under 1 GB. I know it sound tedious and you probably suggest me to increase the XmX in my container-but I want to have another angle besides that.

Here is my code

private void mergeFile(String identifier, int totalFile, String outputFile) throws AppException{
    File[] fileDatas = new File[totalFile]; //we know the size of file here and create specific amount of the array
    byte fileContents[] = null;
    int totalFileSize = 0;
    int filePartUploadSize = 0;
    int tempFileSize = 0;
    //I'm creating array of file and append the length
    for (int i = 0; i < totalFile; i++) {
        fileDatas[i] = new File(identifier + "." + (i + 1)); //indentifier is the name of the file 
        totalFileSize += fileDatas[i].length();
    }

    try {
        fileContents = new byte[totalFileSize];
        InputStream inStream;
        for (int j = 0; j < totalFile; j++) {
            inStream = new BufferedInputStream(new FileInputStream(fileDatas[j]));
            filePartUploadSize = (int) fileDatas[j].length();
            inStream.read(fileContents, tempFileSize, filePartUploadSize);
            tempFileSize += filePartUploadSize;
            inStream.close();
        }
    } catch (FileNotFoundException ex) {
        throw new AppException(AppExceptionCode.FILE_NOT_FOUND);
    } catch (IOException ex) {
        throw new AppException(AppExceptionCode.ERROR_ON_MERGE_FILE);
    } finally {
        write(fileContents, outputFile);
        for (int l = 0; l < totalFile; l++) {
            fileDatas[l].delete();
        }
    }
}

Please show the "inefficient" of this method, once again... only large files that cannot be merge using this method, smaller one ( < 1 GB) no problem at all.... I appreciate if you do not suggest me to increase the heap memory instead show me the fundamental error of this method... thanks...

Thanks

Answers


It's unnecessary to allocate the entire file size in memory by declaring a byte array of the entire size. Building the concatenated file in memory in general is totally unnecessary.

Just open up an outputstream for your target file, and then for each file that you are combining to make it, just read each one as an input stream and write the bytes to outputstream, closing each one as you finish. Then when you're done with them all, close the output file. Total memory use will be a few thousand bytes for the buffer.

Also, don't do I/O operations in finally block (except closing and stuff).

Here is a rough example you can play with.

        ArrayList<File> files = new ArrayList<>();// put your files here
        File output = new File("yourfilename");
        BufferedOutputStream boss = null;
        try 
        {
            boss = new BufferedOutputStream(new FileOutputStream(output));
            for (File file : files) 
            {
                BufferedInputStream bis = null;
                try
                {
                    bis = new BufferedInputStream(new FileInputStream(file));
                    boolean done = false;
                    while (!done)
                    {
                        int data = bis.read();
                        boss.write(data);
                        done = data < 0;
                    }
                }
                catch (Exception e)
                {
                    //do error handling stuff, log it maybe? 
                }
                finally
                {
                    try
                    {
                        bis.close();//do this in a try catch just in case
                    }
                    catch (Exception e)
                    {
                        //handle this
                    }
                }               
            }
        } catch (Exception e) 
        {
            //handle this
        }
        finally
        {
            try 
            {
                boss.close();
            } 
            catch (Exception e) {
                //handle this
            }
        }

... show me the fundamental error of this method

The implementation flaw is that you are creating a byte array (fileContents) whose size is the total file size. If the total file size is too big, that will cause an OOME. Inevitably.

Solution - don't do that! Instead "stream" the file by reading from the "chunk" files and writing to the final file using a modest sized buffer.

There are other problems with your code too. For instance, it could leak file descriptors because you are not ensure that inStream is closed under all circumstances. Read up on the "try-with-resources" construct.


Need Your Help

Creating a function in a macro

macros haxe

I am trying to add a static variable and a static function to all instances of a class and its child classes using the @:build and @:autoBuild macros.