Following the news from the big data world, you might get the impression that Java MapReduce is way past its prime and newer frameworks such as Spark or Flink are the way to go, no questions asked.
Having run big data workloads in a production environment on some of the largest web portals in Germany since 2007, I will argue that in many use cases, from an implementation or performance standpoint the actual choice of parallelization framework does not matter as much as you might think. Understanding your application domain, implementing a sound domain model, and optimizing your data flows based on that domain knowledge will be much more effective when trying to improve performance or write more elegant code than switching to the latest distributed computing framework.
In this session, I will discuss use-cases from our production systems and show how business logic is implemented that is efficient, yet agnostic of the actual computing framework. I will demonstrate how it interacts with MapReduce, how other frameworks such as Spark would work in its stead, and how domain-specific optimizations can work hand in hand with the computing framework of choice.