Hadoop distcp s3 vs s3n use on cmdline and limits

Like most of my posts this is short
To use distcp s3://key:secret@bucket/ you must have it setup as a file system configured with the NameNode.  So you'll basically always use it like s3://bucket-name/

s3 implementation here save blocks of files on hadoop and scrambles the names.  It can't be used standalone.

If you want to use s3 standalone use s3n.

You can test with hadoop fs -ls s3://bucket-name/ if you can access it great, it works.

s3n - which stands for the s3 native protocol has a 5 Gig file size limitation of amazon.

That's the short of it.


Popular posts from this blog

Vim vi how to reload a file your editing