Easiest efficient way to zip output of hadoop mapreduce

I can compress mapreduce output to gzip with

"mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"

Will it be straightforward to implement zip codec for hadoop? Zip is container, but I need only one file per archive, so would it be easy to create ZipCodec with CompressionCodec interface?

Or, maybe there is an efficient way to convert gz files to zips, since they can use same deflate algorithm?

Answers


No big deal, you can wrap a java.util.zip.ZipOutputStream.

You can do this by implementing your own codec, which is done by extending org.apache.hadoop.io.compress.DefaultCodec.

In this codec you wrap the java zip streams by extending org.apache.hadoop.io.compress.CompressorStream respectively org.apache.hadoop.io.compress.DecompressorStream.

In the end you have to override the createInputStream and createOutputStream method and return a new instance of the wrapped streams there.

Still a bit of coding, I'm pretty sure there must be an already existing implementation somewhere (I may recall it also was in a Hadoop release years ago).


Need Your Help

Can't get Angularjs promises working right

javascript angularjs asynchronous promise

I know there's something I'm still missing when grasping promises. I created this jsfiddle to highlight my issue. I'm trying to count down and then go BOOM! Yet the boom is coming first and th...

Can PhoneGap show my web-based application as a embedded browser?

android iphone ios jquery-mobile cordova

I am creating a web application for desktops, tablets and smartphones. My application renders in server side (with asp.net mvc), and it'll be difficult to provide Ajax calls for every page...