Real Time Marketing with Kafka, Storm, Cassandra and a pinch of Spark

Scale
06/07/2016 - 15:20 to 16:00
Maschinenhaus
long talk (40 min)
Beginner

Session abstract: 

The combination of Apache Kafka as a event bus, Apache Storm for real- or neartime processing, Apache Cassandra as an operational storage layer as well as Apache Spark to perform analytical queries against this storage turned out to be a extremely well performing system.

With increasing marketing costs per registration, it is even more important to keep players within the game as well as provide them with attractive offers aiming to increase the customer lifetime value and also create a better game experience.

To that end, we introduced interstitials that offer premium features or discounts for the player at InnoGames. Even though this is already a useful instrument, we aimed to customize those interstitials according to the behavior of the player. Therefore, we created a system that works with generic messages that contain data about user interactions, in real- or neartime -- later referred to as events. The system builds up a player profile that contains all game-relevant information about the players in a central location.

The system is also able to react to events, fetch information about the corresponding player from the profile in a matter of milliseconds and send out an interstitial based on this information.

The system consists of an Apache Kafka component used as an event bus, an Apache Storm topology to update the profile and trigger marketing actions based on events as well as an Apache Cassandra cluster which serves as a storage layer for the profile. In addition, we set up a Apache Spark cluster along Cassandra to run analytical queries against the data in Cassandra based on this article from DataStax (https://academy.datastax.com/demos/getting-started-apache-spark-and-cassandra). This combination is quite efficient but additionally we added a Spark REST JobServer (https://github.com/spark-jobserver/spark-jobserver) and extended it so that we can read the player profile from Cassandra, cache it within the Spark context and reuse this context for several jobs. This increases the performance for analytical queries significantly.

The whole stack combines the possibility of fast event-based operations along with powerful analytical queries using Spark DataFrames. This concept can therefore not only be applied to a marketing system like the one we built, but also to a variety of different use cases. Another key technology used within the system is the Nashorn JavaScript engine included with Java 8. It is used within Storm to work with the player profile and the incoming events. In this way we are able to define new marketing action at runtime and have a very flexible and generic data processing bolt within our Storm topology.

Video: 

Slide: