Hadoop1 and Hadoop2 cleanup files using s3 storage

Files in a HDFS file system can be configured to save files to a trash dir.  If your just using it as a file store you need to manually clean up all the files.  You can do this with the following HDFS command: hadoop fs -Dfs.defaultFS=s3://myS3bucket -Dfs.trash.interval=0 -expunge

Hadoop2 Commands

In hadoop 1.X HADOOP_HOME=/usr/lib/hadoop   Basically you'll use $HADOOP_HOME/bin/hadoop for your commands In hadoop 2.X   Basically you'll use $HADOOP_HOME/bin/hadoop for your commands plus /usr/lib/hadoop-hdfs/bin/hdfs /usr/lib/hadoop-mapreduce/bin/mapred /usr/lib/hadoop-yarn/bin/yarn Typical commands you'll want to use are: hdfs dfs -ls hdfs balancer mapred job -list yarn jar   

Install nginx in CentOS 64 using yum

############### # Install nginx ############### # add the repo to Nginx export NGINX_REPO_FILE=/etc/yum.repos.d/nginx.repo touch $NGINX_REPO_FILE chmod 644 $NGINX_REPO_FILE chown root:root $NGINX_REPO_FILE echo "[nginx]" > $NGINX_REPO_FILE echo "name=nginx repo" >> $NGINX_REPO_FILE echo 'baseurl=$releasever/$basearch/' >> $NGINX_REPO_FILE echo "gpgcheck=0" >> $NGINX_REPO_FILE echo "enabled=1" >> $NGINX_REPO_FILE # Verify it worked cat $NGINX_REPO_FILE yum repolist #install nginx yum -y install nginx.x86_64 # config: /etc/nginx/nginx.conf # config: /etc/sysconfig/nginx # pidfile: /var/run/ # User configs # /etc/nginx/conf.d/*.conf; # Log location # /var/log/nginx/access.log

Hadoop distcp s3 vs s3n use on cmdline and limits

Like most of my posts this is short To use distcp s3://key:secret@bucket/ you must have it setup as a file system configured with the NameNode.  So you'll basically always use it like s3://bucket-name/ s3 implementation here save blocks of files on hadoop and scrambles the names.  It can't be used standalone. If you want to use s3 standalone use s3n. You can test with hadoop fs -ls s3://bucket-name/ if you can access it great, it works. s3n - which stands for the s3 native protocol has a 5 Gig file size limitation of amazon. That's the short of it. -Steve

S3 and S3N Config in Hadoop2 where to put awsAccessKeyId and awsSecretAcceesKey

Short answer is that in Hadoop2 but S3 and S3N setup both in: "core-site.xml" # To Setup S3 Block Filesystem   s3://BUCKET   fs.s3.awsAccessKeyId   ID   fs.s3.awsSecretAccessKey     SECRET # To Setup S3N Native Filesystem   s3n://BUCKET   fs.s3n.awsAccessKeyId   ID   fs.s3n.awsSecretAccessKey     SECRET

iPython a great IDE basically

My new favorite IDE for python is now iPython, been doing more work with scientific computing and machine learning which has lead me to discover iPython.  What a pleasure it is to work with and it's being developed at Berkeley right next door. If you like better interactivity, documentation, autocomplete and stack traces, just use iPython. Check it out: