15. Relevance Models: VSM
TF IDF
For term i in document j
wi,j
= tfi,j
x log(N/dfi
)
tfi,j
= number of occurrences of i in j
dfi
= number of document containing i
N = total number of documents
17. Distributed Architecture
1 Master - N Slaves
good for scaling queries
not good for scaling data
Sharded index with replication
good for scaling queries
good for scaling data