The document summarizes a project to create an expert finding search engine using Twitter data. The goals are to collect tweets from 3000 users over 1000 tweets each on particular areas and save to a MySQL database. The search engine will then allow querying this data to return related experts. It will use both traditional vector space models and language models with Dirichlet smoothing to calculate relevance scores and rank results. Sample retrieval scenarios and evaluations of the results are provided to illustrate the system.
2. • We need to create an "expert finding" search engine. When
people use google and bing, they have access to numerous web
documents. However, there are many occasions that finding the
right people is more important than finding the right
documents. Our project is to create a search engine whose data
is based on twitter. Our goal is to grab a particular area experts'
twitter data and save the data in MYSQL database. With the
help of the data, when we type some query in the input box, the
search engine will generate the related experts' information as
the result.
Interpretation of Project
4. • The users will be experts or students who are
interested in information retrieval area and want to
know IR experts’ information.
• Data Input & Output
Goals
7. • Traditional Vector Space Model:
• In traditional method, we calculate the tf and idf for query and document. After
that, we will calculate the weight of query and the document. Finally, we will do
the product of query and document weight. As for the final score for query
matching the document which is 0+0+0.82+ 2.46=3.28. Then the query will go
again and again in every document to calculate each document scores. At last,
they will rank all the scores within the collection.
Approach Steps
11. Evaluation of the retrieval results:
Vector Space Model Language Model
Result return time:
Nanosec
67629667000 136555119000
Precision More Precision
Return Value:
Vector Space model
return:
882=236.07245706131792
3821=209.29423849205102
877=146.25158485743512
3811=133.04793510520034
3833=127.16441840114109
3386=112.38369647213892
3719=106.16029341775558
3216=91.17045922756672
3797=90.50222759464644