Get the unique word count of each word in Hive
I am having a table such as follows,
select * from tablename;
ID sentence 1 This is a sentence 2 This might be a test 3 America 4 This this
I want to write a query to split the sentence into words and get the count of the words in the descending order. I want to have an output something like,
word count Unique(ids) This 4 3 a 2 2 might 1 1 . . .
where count is the number of times the word has occurred in the column and Unique(ids) is the number of users with that word.
I am thinking in what way we can write a query to do this?
Can anybody help me doing this in hive?
select id, word from tablename tn lateral view explode( split( tn.sentense, ' ' ) ) tb as word
the result will be:
1 This 1 is 1 a 1 sentense 2 This 2 might 2 be 2 a 2 test 3 america
aggregate the result