How to put files in flume spooldir one by one?

I am using flume spooldir to put files in HDFS, but I am getting so many small files in HDFS. I thought of using batch size and roll interval, but I don't want to get dependent on size and interval. So I decided to push files in flume spooldir one at a time. How can I do this?

Answers


According to https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source, if you set a1.sources.src-1.fileHeader = true, then you can specify any headers (for example the file name header) in the HDFS Sink (see %{host} in the escape sequence description at https://flume.apache.org/FlumeUserGuide.html#hdfs-sink.

EDIT: For an example config, you can try the following:

a1.sources = r1
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /flumespool
a1.sources.r1.basenameHeader = true

a1.channels = c1
a1.channels.c1.type = memory

a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flumeout/%{basename}
a1.sinks.k1.hdfs.fileType = DataStream

Need Your Help

Modifying DevExpress GridControl expand button

c# devexpress gridcontrol

I am new to DevExpress and I couldn't find a solution to modify the plus sign on nested gridviews.