Error while copying from S3 to HDFS

I am trying to copy some files from S3 bucket to HDFS of my EMR cluster. But I am getting the following error:

Exception in thread "main" java.lang.RuntimeException: Error running job
    at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:771)
    at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:580)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://10.87.26.26:9000/tmp/33e4f3b9-d29a-49e8-9706-ea70e07e3ff2/files
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
    at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
    at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:751)
    ... 9 more

The command I am using is :

./elastic-mapreduce --jobflow  j-12345678 --jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,s3n://my-bucket/data/,--dest,hdfs:///data/in,--srcPattern,xyz01-1-1*ped*' --step-name "Copy input files to HDFS" --wait-for-steps

I tried to run the sample word-count job, to check if there is any issue with HDFS, but it ran fine.

Can anyone please help me with this? If any more info is needed, please let me know and I will update the description.

Answers


Usually its the --srcPattern '<regex>' argument. You can also use hadoop fs -cp s3://src/file1.something /my/output/path/ to test for 1 file and modify your regex. Also starting with .* any char-0 or more times, should relax the matching.

It would be great to know if regex non-matches get logged and where.


Need Your Help

Need a Gridview to Change when DropDownList changes

asp.net vb.net gridview drop-down-menu

I have a dropdownlist in ASP.Net (I am using VB.Net) and my ASP.Net code follows. What I would like to do is have that Girdview populate dynamically off of the DropDownList. If the first option (

ExtJS 4 add custom filter to GridPanel that uses reconfigure method

javascript extjs filter grid extjs4

I am trying to solve a problem with a functionality I need to implement for a grid. This grid should use a custom "filter" which extends from "Ext.ux.grid.FiltersFeature", this component works fine...