Using Random Projections to Make Sense of High-Dimensional Big Data

Search
06/02/2015 - 16:30 to 17:10
long talk (40 min)
Intermediate

Session abstract: 

It is hard to understand what is hidden in big high dimensional data. However, a moderate number of simple one dimensional projections is enough to answer hard questions about the data via techniques such as visualization, classification and clustering. Random projections have emerged as an extremely effective component of many algorithms for high dimensional data. For example, they are used in the context of nearest neighbor search (via locality sensitive hashing), dimensionality reduction and clustering. The goal of the talk is to give a pleasant journey into the rich area of random projections via many graphical illustrations and intuitive examples. We present how and why random projections work and where they break. We discuss several interesting properties of high dimensional data. For example, why data in high dimensions is likely to look Gaussian when projected in low dimensions; how to spot interesting patterns in high dimensional data by projecting into a lower dimension; and how to choose meaningful low dimensional projections. The method of random projections has a number of good properties: 1) scalability; 2)  it reduces the machine learning problem to search and can take advantage of existing infrastructure; 3) it is relatively simple to implement and 4) it is robust to noisy data. 

Link to the talk website: http://stefansavev.com/randomtrees/

Video: