Scoring for human beings

05/27/2014 - 16:30 to 17:10

Kesselhaus

long talk (40 min)

Beginner

Session abstract:

When you are running elasticsearch for for free text search, you probably use Lucenes tf-idf scoring formula to determine the relevancy of a document. This is usually great because this formula is one-size-fits-most for free text queries. But what if you are not one of the most? And when are you not one of them?

In this talk I will explain the basics of determining relevancy of a document and how scores can be customized when using elasticsearch.

I will start off by recapitulating the vector space model for scoring and how tf-idf works in detail - for human beings. This explanation will be accompanied with practical examples of pitfalls you might encounter when the scored text actually represents tags. I will then give an overview over the options in elasticsearch to tweak scores arbitrarily by making use of numerical document values but also by using text features stored in the Lucene index. Finally I will show examples of how you can implement your own flavor of scoring functions like tf-idf, language model and cosine similarity without touching a single line of java code.