Thursday, July 29, 2010

FlumeJava: Easier way for writing map-reduce chains

Some Google guys have recently published a paper about a Java library for helping on developing and optimizing chains of map-reduce jobs. It is called FlumeJava.

The library is very interesting. From my view, it is clear that it simplifies developing on map-reduce. Instead of hand writing your jobs and chaining them manually, it lets you define your computations using some java syntax with the help of some immutable collections, and leaves the library the responsibility to find the best execution plan.

You get your code splitted in several pieces of code when you develop using standard map-reduce jobs. That makes the code fragmented and less clear. FlumeJava allows you to keep your business logic closer.

Well, let’s see if that amazing Hadoop guys implement something similar for the community.

UPDATE (2010-11-12): The amazing Hadoop guys has started to move. Ted Dunning has created Plume, the "Hadoop FlumeJava". My friend Pere Ferrera is also colaborating with the Plume development.

Amazon Web Services Rocks!

During my two months working on “Hadooping” the international classifieds search engine, I have had the opportunity to test Amazon Web Services. I got greatly impressed by services like S3, but especially by Amazon EC2. It changes completely the way systems are managed in traditional hosting services.

First of all, the system is completely flexible and immediate. You can install whichever image you want with your favorite OS, and start as many instances as you need just with a few clicks, or making hot backups (snapshots), and use as much storage as you want.

Second, it is really easy to use. Much easier than any other traditional hosting service I have ever known.

Third, you only have to pay for the fraction of time and resources you used, with a competitive price. That’s good, because it fits both for small companies or users dealing with typical websites as well as for big companies managing computing clusters. As an example of the small enters barriers and flexibility, you could instantiate a small machine for one hour for just 0,09 $, booking it instantly.

Definitively I have to encourage everyone in need of hosting to start using Amazon Web Services.