Showing posts from 2014

Tmux with Native Copy and Paste

This is a great tip and productivity enhancer.  You can now do native copy and paste with Tmux and OSX starting with Tmux 1.8 The guys over at thoughbot put this together and it's a good tip. You'll use the new copy-pipe  command to setup defaults in your tmux config. See the details in their post.

Install a single node Yarn / HBase Cluster with a single command

Setting up a cluster can be time consuming especially with all the services: (Hbase, Zookeeper, Yarn, Hive, Oozie, Ambari, HCat, HDFS, WebHCat) Use this script and set it up with a single command.

Setup a Single Node Hadoop 2 Cluster with a Single Command

Setup a hadoop 2 cluster with a single command: curl -sSL | bash -s -- -r For development purposes we wanted a easy way to setup an environment that's been tested everywhere we work: Vagrant AWS DigitalOcean This makes things much easier: Feel free to help add compatibility to other linux distros
Good read on HDFS small file compaction: With that decided, we then looked for options to aggregate and compact small files on Hadoop, identifying three possible solutions: filecrush  - a highly configurable tool by  Edward Capriolo  to “crush” small files on HDFS. It supports a rich set of configuration arguments and is available as a jarfile ( download it here ) ready to run on your cluster. It’s a sophisticated tool - for example, by default it won’t bother crushing a file which is within 75% of the HDFS block size already. Unfortunately, it does not work yet with Amazon’s s3:// paths, only hdfs:// paths - and our  pull request  to add this functionality is incomplete Consolidator  - a Hadoop file consolidation tool from the  dfs-datastores  library, written by  Nathan Marz . There is scant documentation for this - we could only find one paragraph,  in this email thread . It has fewer capabilities than filecrush, and could do with a CLI-like wrapper to invoke it (we started w

one liner to download java 7 jdk

Download java jdk 7 with one line: wget --quiet --no-cookies -O /vagrant/jdk-7u45-linux-x64.rpm --header 'Cookie:;' --no-check-certificate creates=jdk-7u45-linux-x64.rpm

Netstat to find your kafka port

If your looking to see what address Kafka is bound to try using netstat. $netstat -tulpn