Amazon web services and Hadoop

By Steve Morin June 05, 2008

This is a prettyPublish Post cool project using Hadoop and Amazon web services

From New York Time Article Cited Below

This all adds up to terabytes of data, in a less-than-web-friendly format. So reusing the EC2/S3/Hadoop method I discussed back in November, I got to work writing a few lines of code. Using Amazon Web Services, Hadoop and our own code, we ingested 405,000 very large TIFF images, 3.3 million articles in SGML and 405,000 xml files mapping articles to rectangular regions in the TIFF’s. This data was converted to a more web-friendly 810,000 PNG images (thumbnails and full images) and 405,000 JavaScript files — all of it ready to be assembled into a TimesMachine. By leveraging the power of AWS and Hadoop, we were able to utilize hundreds of machines concurrently and process all the data in less than 36 hours.

http://open.blogs.nytimes.com/2008/05/21/the-new-york-times-archives-amazon-web-services-timesmachine/?

Search This Blog

Maiar's Vault

Amazon web services and Hadoop

Comments

Popular posts from this blog

Vim vi how to reload a file your editing

multi line string in bash

python how to i++