Relevance metrics like NDGC or ERR require graded judgements to evaluate query relevance performance. But what happens when we don't know what 'good' looks like ahead of time? This talk will look at using click modeling techniques to infer relevance judgements from user interaction logs.