Thursday, April 17, 2008

Bloom Filter

As Wikipedia says, the bloom filter

is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives.

This interesting structure needs little memory resources, about 10 bits per item for 1% false positive rate.

This structure is used in BigTable in order to reduce the look-up time of non existing keys.

More information and a talk on Bloom Filters here.

No comments: