Online learning, Vowpal Wabbit and Hadoop

06/02/2015 - 15:20 to 16:00
Stage 1
long talk (40 min)

Session abstract: 

Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a brief reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.