Acceptably inaccurate: probabilistic data structures

Scale

06/06/2016 - 11:00 to 11:40

Kesselhaus

long talk (40 min)

Intermediate

Session abstract:

Writing software for the Internet puts engineers face to face with traffic that makes doing even simple things difficult. Seemingly straightforward operations such as counting, set membership and set cardinality become either extremely slow or prohibitively expensive to do using traditional methods.

Help is at hand, however: probabilistic techniques are both fascinating and extremely useful. By sacrificing a predictable amount of accuracy, we can perform operations at scales we never thought possible, and fast! We'll introduce approaches such as Bloom filters for set membership, count-min sketch for frequency in streams, and HyperLogLog for cardinality. No maths PhD is required.

We'll look at the before and after effects of using these techniques in real-world scenarios, and present the libraries that you can go away and play with right now.

Video:

#bbuzz 2016: James Stanier - Acceptably inaccurate: probabilistic data structures

Slide:

probabilistic_data_structures.pdf

Berlin Buzzwords

Acceptably inaccurate: probabilistic data structures

Session abstract:

Video:

#bbuzz 2016: James Stanier - Acceptably inaccurate: probabilistic data structures

Slide:

probabilistic_data_structures.pdf

Newsletter

Partners

Platinum Partner

Past conferences