Converting CSV to SequenceFile

I have a CSV file which I would like to convert to a SequenceFile, which I would ultimately use to create NamedVectors to use in a clustering job. I've been using the seqdirectory command to try to make a SequenceFile, and then fed that output into seq2sparse with the -nv option to create NamedVectors. It seems like this is giving one big vector as an output, but I ultimately want each line of my CSV to become a NamedVector. Where am I going wrong?

Answers


seqdirectory command takes every file as a document, so in reality, you only have one document, hence you only get one vector. To make it work properly you would make each line of your CSV file a file itself, where the key of the document is the name of the file and the value are its content. Nonetheless, this is quite unpractical if your corpus is large as disk reading and writing can become painfully slow.

In practice you are better off following the links I share in this comment


Need Your Help

counting mysql values

sql mysql aggregate-functions

I have a table with two fields, TAG_ID and TAG_DESC

Is there a CLI for the Watson IBM IoT service on Bluemix?

cloud ibm-cloud iot watson-iot

I use the IBM Watson IoT service in Bluemix, and I would like to know if there is a CLI for this service.