Fail to load data from Hive to ElasticSearch

I'm currently trying to load data from Hive to ElasticSearch. I'm using cloudera CDH 5.3. I've already added the hadoop-es hive 2.0.2 jar to my hive path. I have ElasticSearch 1.4.4 up and running on 10.44.162.169.

I now have a table called hive_cdr with following properties:

 traffic_type_id (big int)
 appelant (int)
 called_number (int)
 call_duration (int)
 location_number (string)
 date_heure_appel(string)

I'm trying to define the ES table in my hive to load in some data. To do so, I've done this:

CREATE EXTERNAL TABLE es_hive_cdr (
traffic bigint ,
calling int ,
called int ,
duration int ,
location string ,
date string )
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.nodes'='10.44.162.169',
'es.resource'='indexCDR/typeCDR'
) ;

But, I got this exception saying that the EsStorage is not recognized.

I've deleted the EsStorage line and executed to try to find out what's going on.

Tried now to load data from my hive_cdr table to my new one :

insert into table es_hive_cdr2
select
traffic_type_id,
appelant,
called_number,
call_duration,
location_number,
date_heure_appel
from hive_cdr;

But It's failing and I got this error :

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

STAGE DEPENDENCIES:

  Stage-1 is a root stage
  Stage-7 depends on stages: Stage-1 , consists of Stage-4, Stage-3, Stage-5
  Stage-4
  Stage-0 depends on stages: Stage-4, Stage-3, Stage-6
  Stage-2 depends on stages: Stage-0
  Stage-3
  Stage-5
  Stage-6 depends on stages: Stage-5

STAGE PLANS:

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: hive_cdr
            Statistics: Num rows: 267130 Data size: 58768736 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: traffic_type_id (type: bigint), appelant (type: int), called_number (type: int), call_duration (type: int), location_number (type: string), date_heure_appel (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
              Statistics: Num rows: 267130 Data size: 58768736 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 267130 Data size: 58768736 Basic stats: COMPLETE Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.elasticsearch.hadoop.hive.EsSerDe
                    name: default.es_hive_cdr2

  Stage: Stage-7
    Conditional Operator

  Stage: Stage-4
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://master:8020/user/hive/warehouse/es_hive_cdr2/.hive-staging_hive_2015-03-02_14-09-08_285_4734041865540737822-2/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.elasticsearch.hadoop.hive.EsSerDe
              name: default.es_hive_cdr2

  Stage: Stage-2
    Stats-Aggr Operator

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.elasticsearch.hadoop.hive.EsSerDe
                  name: default.es_hive_cdr2

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.elasticsearch.hadoop.hive.EsSerDe
                  name: default.es_hive_cdr2

  Stage: Stage-6
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://master:8020/user/hive/warehouse/es_hive_cdr2/.hive-staging_hive_2015-03-02_14-09-08_285_4734041865540737822-2/-ext-10000

I'm really in need of some help and guidance and be way to appreciative and thankful for you !

Answers


Try to give table properties.

TBLPROPERTIES('es.resource' = 'myviews/myview', 'es.nodes' = 'hostname-of-es-cluster', 'es.port' = '9200', 'es.input.json' = 'false', 'es.write.operation' = 'index', 'es.index.auto.create' = 'yes','es.nodes.wan.only' = 'true');

Also change the property in your elasticsearch.yml file to below one

network.host: _site_


Need Your Help

Does Scipy Sparse use the (sparse) BLAS library?

numpy scipy sparse-matrix blas

Numpy can use one of a number of BLAS libraries (eg. ATLAS, MKL, OpenBLAS etc.).