Tuesday, July 8, 2008

Google Protocol Buffers released as Open Source

Google has released as open source its Protocol Buffers library, used for serializing structured data (documentation). Google has been massively using this library in their systems for storing and sharing data. I guess that most of the files stored in their internal GFS are encoded using Protocol Buffers.

It has several interesting features. First of all, the types support variable-length encoding. This fact can lead to big storage savings when dealing with big amounts of data.

The second characteristic is that it allows changes in the data schema at the same time that
forward compatibility is maintained. This point is really important due to the fact that
changes in the schema are something common in practice. Besides, forward compatibility allows old systems and data to cohabitate with new ones.

The third feature is its availability for C++, Java and Python, making it easy to share data between these three languages. Facebook has recently released Thrift, another approach to serialization and RPC.

More information about the topics in this post and the comparison with Hadoop serialization on Tom White blog

No comments: