Hive - static partitioning - difference between creating the partition directory directly vs using alter table statement
Are there any internal/performance difference between the below two statements for creating static partitioning in hive, I have tried both ways and both of them are working without any issues after loading the data into partition
dfs -mkdir /user/cloudera/sqoop_import/avroData/orders_part/order_month=2014-02; alter table orders_part add partition(order_month='2014-02');
This command: dfs -mkdir /user/cloudera/sqoop_import/avroData/orders_part/order_month=2014-02; does not create partition, it creates a directory. This directory is not mounted as a table partition yet. Partition is a directory plus a metadata containing information about partition (key value+partition directory) stored in metastore. You can check it easily using show partitions orders_part; command after executing mkdir. This directory will not be in the partitions list.
alter table orders_part add partition(order_month='2014-02'); Creates a directory order_month=2014-02 and mounts it as a partition.
Partitions can be created dynamically using
insert overwrite table orders_part partition(order_month) select ...
command. In this case directories will be created automatically and mounted as partitions.
Consider this: You can make a partition not necessarily located in directory equal to 'key=value'. For example: alter table orders_part add partition(order_month='2014-02') location '/user/cloudera/sqoop_import/avroData/orders_part/mydir' ; Note the partition directory is now '/user/cloudera/sqoop_import/avroData/orders_part/mydir'.