The Wikimedia Foundation is a non-profit and charitable organization driven by a vision of a world where every human can freely share in the sum of all knowledge. Each month Wikimedia sites serve over 18 billion page views to 500 million unique visitors around the world.
Among the many resources offered by Wikimedia is a public-facing API that provides low-latency, programmatic access to full-history content and meta-data, in a variety of formats. Commonly, results from this system are the product of computationally intensive transformations, and must be pre-generated and persisted to meet latency expectations. Unsurprisingly, there are numerous challenges to providing low-latency storage of such a massive data-set, in a demanding, globally distributed environment.
In this talk, we will cover the Wikimedia content API, and it's use of Apache Cassandra, a massively-scalable distributed database, as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.