Incremental sqoop from oracle to hdfs with condition

I am doing a incremental sqooping from to hdfs oracle giving where condition like

(LST_UPD_TMST >TO_TIMESTAMP('2016-05-31T18:55Z', 'YYYY-MM-DD"T"HH24:MI"Z"')
 AND LST_UPD_TMST <= TO_TIMESTAMP('2016-09-13T08:51Z', 'YYYY-MM-DD"T"HH24:MI"Z"'))

But it is not using the index. How can I force an Index so that sqoop can be faster by considering only filtered records.

What is the best option to do incremental sqoop. Table size in oracle is in TBs. Table has billions rows and after where condition it is in some million

Answers


You can use --where or --query with where condition in select to filter import results

I was not sure about your sqoop full command, just give a try in this way

sqoop import 
    --connect jdbc:oracle:thin:@//db.example.com/dbname \
    --username dbusername \
    --password dbpassword \
    --table tablename \
    --columns "column,names,to,select,in,comma,separeted" \
    --where "(LST_UPD_TMST >TO_TIMESTAMP('2016-05-31T18:55Z', 'YYYY-MM-DD\"T\"HH24:MI\"Z\"') AND LST_UPD_TMST <= TO_TIMESTAMP('2016-09-13T08:51Z', 'YYYY-MM-DD\"T\"HH24:MI\"Z\"'))" \
    --target-dir {hdfs/location/to/save/data/from/oracle} \
    --incremental lastmodified \
    --check-column LST_UPD_TMST \
    --last-value {from Date/Timestamp to Sqoop in incremental}

Check more details about sqoop incremental load


Update

For incremental imports Sqoop saved job is recommended to maintain --last-value automatically.

sqoop job --create {incremental job name} \
    -- import
    --connect jdbc:oracle:thin:@//db.example.com/dbname \
    --username dbusername \
    --password dbpassword \
    --table tablename \
    --columns "column,names,to,select,in,comma,separeted" \
    --incremental lastmodified \
    --check-column LST_UPD_TMST \
    --last-value 0 

Here --last-value 0 to import from start for first time then latest value will be passed automatically in next invocation by sqoop job


Need Your Help

Informix SQL to get date range that changes automatically

sql informix

Please help me with an Informx SQL query to be run every Friday.