With 100M+ products in a single Solr index it is hard work to keep response times as low as possible. A key factor is decreasing the index size and the number of terms indexed. We try to store the significant terms only. For long product descriptions this is challenging. Applying NLP at index time is a costly operation. In this talk I’ll present a smart algorithm that run fast enough to be applied at index time.