C#: Code to fit LOTS of files onto a DVD as efficiently as possible

I need to write an application that will take a list of files (some large, some small) and fit them onto DVDs (or CDs, or whatever) as efficiently as possible. The whole point of this application is to use up as much of the 1st disc before moving onto the 2nd disc, filling the 2nd disc up as much as possible before moving onto the 3rd disc, etc.

(Note: The application doesn't have to do the actual burning to the DVD, it just has to figure out the best possible fit).

I initially thought I had a good game-plan by generating a permutation of the files and then checking each combination to see what fits the best. (My request for help on this can be found HERE)

But the more files there are, the longer it takes... exponentially. So I wanted some of your opinions on how to best achieve this.

Any ideas? And, as always, C# code is always appreciated.

Answers


Simple algorithm:

  1. Sort the file list by file size
  2. Find the largest file smaller than the remaining free space on the DVD, and add it to the DVD.
  3. If the remaining DVD free space is smaller than any remaining files, start a new dvd.
  4. Repeat from 2.

What you're facing is related to the knapsack problem. The linked wikipedia page has lots more information, including suggested ways of solving it.


For anyone still interested in this question... I wrote a utility which I used for a similar purpose of fitting files into a set of disks/discs. It uses a command-line/file-based interface. Versions are available in C, C++, & Java (not C#).

http://whizman.com/code/diskfit.tgz

More detailed information is in the diskfit.tgz:Doc/diskfit.txt file.

(AGPL3)

We might characterize the question as 0-1 multiple-knapsack, or linear bin packing. (Thanks jon-skeet for the link about knapsack problem.)

Dthorpe solves linear bin packing, for exactly enough bins/disks to fit all files [nicely O(n) or O(n lg n) fast - also may be feasible in spreadsheet without having to write a script].

Basically, diskfit (above-linked utility) outputs qualifying file-sets based around 0-1 single-knapsack, and the user chooses one-disk file-sets to assemble into the disk-set - assisting the user (but not fully automating) toward both:

  • linear bin packing - for the complete disk set;
  • 0-1 multiple-knapsack - for each subset of disks 1..k of the full disk set (where files are prioritized, aka differ in value).

Full programmatic choice of the complete such disk-set, would be an additional feature. It would be insufficient to apply 0-1 single-knapsack solution, automatically disc by disc [greedily]. (Consider 3 knapsacks of capacity 6, and available items with equal value and weights of: {1, 1, 2, 2, 3, 4, 5}. Applying 0-1 knapsack to the first knapsack in isolation would choose {1, 1, 2, 2} to obtain sum value 4 - after which we cannot fit all of the remaining 3 items in the remaining second & third knapsacks - whereas we know we can fit all items in the 3 knapsacks as {1, 2, 3} & {1, 5} & {2, 4}.)


for each file
 is there enough room this dvd?
   yes, store it here
   no, is there room on another already allocated dvd?
     yes, store it there
     no, allocate another dvd and store it there

While thats a cool problem to solve in a program for certain applications... however in your application, why not just use WinRAR or some other archiving program that has the capability to split up the archive into specific sized file chunks. You could make each chunk the size of a DVD and then just burn away.

EDIT: one issue you would run into is that if one of your files is greater than the size of your media, you are not going to be able to burn that file.


How about if you started by putting as many of the largest files you can onto one DVD and then filling it up with as many of the smallest files that you can (starting with the smallest).

Repeat this process with the remaining files for each disk.

I'm not sure that's going to give you perfect coverage/distribution but I think it might go some way to solving your needs.


use backtracking to get the optimal set of files to burn to dvd 1, then exclude them from the list and use backtracking on the remaining files to get the optimal fill for dvd 2 and so on


I've found a lot of tools that are supposed to solve this problem, but they all try to minimize the TOTAL number of disc used, while I was just interested into the SINGLE subset of files that best fit a SINGLE disc.

So i've ended writing my own tool called "ss" (from the "subset sum" algorithm which is based from). The tool is still buggy and can't recurse directories, but it's working for me. :)


This problem is the Bin Packing Problem and is NP complete, which means if you want a truly optimal solution you will need exponential time. However there are methods that give less than optimal solutions but run much faster.

Assume we have an unbounded list of disks. Take each file ordered descending in size, then add each file to the first disk that it fits in. This is called First fit decreasing and takes 11/9 OPT + 6/9 disks in worst case. If you choose files in a random order you instead need 11/9 OPT + 1 disks.

There are algorithms that will pack things tighter, see the wikipedia link above for more details.


Need Your Help

Flex converting numeric string to exponential form

xml web-services actionscript-3 apache-flex soap

This is a weird issue I came across or a Flex 3 shortcoming. I am using SOAP messaging protocol and the response from the server for filling the 'Notes' field which I have on a pop-up looks like th...

Same code different plot in qplot vs ggplot

r ggplot2

I get very different results, the code should be equivalent?