How to find frequency of a variable over multiple columns in hive?

I have data regarding gender of people under 8 columns:

mem1;mem2;mem3;mem4;mem5;mem6;mem7;mem8
MALE;FMALE;UNKN;MALE;FMALE;FMALE;MALE;MALE

Now I want to find out the frequency of male, fmale, unkn using hive. Something like

MALE 4
FMALE 3
UNKN 1

I'm new to Hive but I know we need to use group by. Can someone please help me with the query?

Answers


Use Hive Reflect to get the counts.

  1. Create table having whole line as one column

  2. Use Hive Reflection to count number of occurrence on the column. Example

select reflect("org.apache.commons.lang.StringUtils", "countMatches", "MALE;FMALE;UNKN;MALE;FMALE;FMALE;MALE;MALE", "MALE") as Males, reflect("org.apache.commons.lang.StringUtils", "countMatches", "MALE;FMALE;UNKN;MALE;FMALE;FMALE;MALE;MALE", "FEMALE") as females from mytable


Need Your Help

How Do I Use Custom Error Pages in Visual Basic Express Edition 2008

vb.net internet-explorer compiler-errors

I am sick of having to have my web browser's users see the internet explorer error page when the internet is down or the webpage doesn't exist. Is there a way to (relatively simply is preferred) se...

How to print the $_REQUEST that is available inside a specific AJAX function?

javascript php jquery ajax

I'm a beginner in PHP, and I'm having some trouble debugging AJAX, because I can't just print_r what I want to know inside the script.