Kevin teh insight presentation

  • 348 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
348
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Disambiguating Twitter Search Kevin Teh kkwteh@gmail.com Insight Data Science Fellows Program March 2013Tuesday, February 26, 13
  • 2. That’s not the python that I meant...Tuesday, February 26, 13
  • 3. The solution? cluster-pluck.Tuesday, February 26, 13
  • 4. cluster-pluck disambiguates Twitter search in real timeTuesday, February 26, 13
  • 5. It works in Spanish too!Tuesday, February 26, 13
  • 6. Tuesday, February 26, 13
  • 7. Tools Word Filter Web Application 300,000 Tweets Filter UserTuesday, February 26, 13
  • 8. Algorithm read query and d/l corpus of 1500 tweets filter out common words count link two candidates words if their relative proportion of co- rank remaining occurrence is words by number select potentially greater than 0.25 of occurrences and meaningful words select top 10 rank connected rank remaining components by cluster candidates words by rate of total occurrences into groups capitalization and and take top 3 select top 10 assign tweets to clustersTuesday, February 26, 13
  • 9. Kevin Teh kkwteh@gmail.com Math PhD -- May ’13 B.A.Sc. -- April ’07 Topic: Noncommutative Geometry (Whatever that is) Engineering Science (Whatever that is)Tuesday, February 26, 13
  • 10. Tuesday, February 26, 13