Maiar's Vault

Posts

Showing posts from 2013

Hadoop1 and Hadoop2 cleanup files using s3 storage

By Steve Morin November 26, 2013

Files in a HDFS file system can be configured to save files to a trash dir. If your just using it as a file store you need to manually clean up all the files. You can do this with the following HDFS command: hadoop fs -Dfs.defaultFS=s3://myS3bucket -Dfs.trash.interval=0 -expunge

Hadoop2 Commands

By Steve Morin November 21, 2013

In hadoop 1.X HADOOP_HOME=/usr/lib/hadoop Basically you'll use $HADOOP_HOME/bin/hadoop for your commands In hadoop 2.X Basically you'll use $HADOOP_HOME/bin/hadoop for your commands plus /usr/lib/hadoop-hdfs/bin/hdfs /usr/lib/hadoop-mapreduce/bin/mapred /usr/lib/hadoop-yarn/bin/yarn Typical commands you'll want to use are: hdfs dfs -ls hdfs balancer mapred job -list yarn jar

Install nginx in CentOS 64 using yum

By Steve Morin November 21, 2013

############### # Install nginx ############### # add the repo to Nginx export NGINX_REPO_FILE=/etc/yum.repos.d/nginx.repo touch $NGINX_REPO_FILE chmod 644 $NGINX_REPO_FILE chown root:root $NGINX_REPO_FILE echo "[nginx]" > $NGINX_REPO_FILE echo "name=nginx repo" >> $NGINX_REPO_FILE echo 'baseurl=http://nginx.org/packages/centos/$releasever/$basearch/' >> $NGINX_REPO_FILE echo "gpgcheck=0" >> $NGINX_REPO_FILE echo "enabled=1" >> $NGINX_REPO_FILE # Verify it worked cat $NGINX_REPO_FILE yum repolist #install nginx yum -y install nginx.x86_64 # config: /etc/nginx/nginx.conf # config: /etc/sysconfig/nginx # pidfile: /var/run/nginx.pid # User configs # /etc/nginx/conf.d/*.conf; # Log location # /var/log/nginx/access.log

Hadoop distcp s3 vs s3n use on cmdline and limits

By Steve Morin November 20, 2013

Like most of my posts this is short To use distcp s3://key:secret@bucket/ you must have it setup as a file system configured with the NameNode. So you'll basically always use it like s3://bucket-name/ s3 implementation here save blocks of files on hadoop and scrambles the names. It can't be used standalone. If you want to use s3 standalone use s3n. You can test with hadoop fs -ls s3://bucket-name/ if you can access it great, it works. s3n - which stands for the s3 native protocol has a 5 Gig file size limitation of amazon. That's the short of it. -Steve

S3 and S3N Config in Hadoop2 where to put awsAccessKeyId and awsSecretAcceesKey

By Steve Morin November 20, 2013

Short answer is that in Hadoop2 but S3 and S3N setup both in: "core-site.xml" # To Setup S3 Block Filesystem fs.default.name s3://BUCKET fs.s3.awsAccessKeyId ID fs.s3.awsSecretAccessKey SECRET # To Setup S3N Native Filesystem fs.default.name s3n://BUCKET fs.s3n.awsAccessKeyId ID fs.s3n.awsSecretAccessKey SECRET

iPython a great IDE basically

By Steve Morin April 16, 2013

My new favorite IDE for python is now iPython, been doing more work with scientific computing and machine learning which has lead me to discover iPython. What a pleasure it is to work with and it's being developed at Berkeley right next door. If you like better interactivity, documentation, autocomplete and stack traces, just use iPython. Check it out: http://ipython.org/