Your SlideShare is downloading. ×
Information Retrieval Challenges
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Information Retrieval Challenges

4,055
views

Published on

Invited lecture at POSI (Post Graduation on Information Systems at INESC, Lisbon, Portugal) about Information Retrieval Challenges: real-time retrieval, context awareness and inferring identity from …

Invited lecture at POSI (Post Graduation on Information Systems at INESC, Lisbon, Portugal) about Information Retrieval Challenges: real-time retrieval, context awareness and inferring identity from content.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,055
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
79
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide























  • - Google Chrome plugin
    - Get identity information from Google Social Graph API or other means


  • Transcript

    • 1. at io n In fo rm l ri ev a R et en ges C h a ll Bruno Pedro March 2010
    • 2. Bruno Pedro A n e x p e r i e n c e d We b d e v e l o p e r a n d entrepreneur. Has extensive background in large scale projects and technical writing. http://tarpipe.com/user/bpedro
    • 3. What is tarpipe? User
    • 4. What is tarpipe?
    • 5. 3 Challenges • Real-Time Retrieval • Understanding Context • Inferring Identify
    • 6. Real-Time Retrieval http://www.flickr.com/photos/josephrobertson/127758523/
    • 7. WordPress source: wordpress.com • Average ~10K posts/hour • ~3 new posts every second
    • 8. twitter source: mashable.com • Average ~1.1M tweets/hour • ~300 new tweets every second
    • 9. Challenge • ~300 reads/second • 160 X 300 = 48 KB/second = 4 GB/day (approximate calculation) • How to process all this information?
    • 10. Strategy • Read and store immediately: • High performance write storage • No locks allowed • Prepare for lots of reading errors
    • 11. Strategy • Process later: • Regular expressions • Term extraction • Machine learning
    • 12. Context For me context is the key - from that comes the understanding of everything. — Kenneth Noland
    • 13. source: joelonsoftware.com Dogs?
    • 14. source: Google Reader Play Dogs with unmatched title?
    • 15. source: Google Buzz Still doesn’t make a lot of sense...
    • 16. This is the worst case scenario
    • 17. Challenge • Find context from associated content: • Pictures • Comments • Location information • Timelines • Authors
    • 18. Strategy • Associate content through common identifiers • Establish timeline of different pieces • Group pieces by same author • Present in a comprehensible fashion
    • 19. Identity source: abc Australia
    • 20. Many Identifiers • E-mail: user@example.com • facebook: @User Name • flickr: user or User Name (?) • Google Buzz: @user@example.com • twitter: @user ...
    • 21. Addressable • http://facebook.com/user • http://flickr.com/user • http://www.google.com/profiles/user • http://twitter.com/user ...
    • 22. How to make sense?
    • 23. Challenge • Parse every message, tweet or post • Find possible user identifiers • Substitute for meaningful information: • A link to the original profile • Equivalent identity on destination
    • 24. Strategy • Decentralized processing: • Browser based (plugin) • Extract identities from page • Process • Replace with meaningful information
    • 25. Food for thought • PubsubHubbub http://code.google.com/p/pubsubhubbub/ • Activity Streams http://activitystrea.ms/ • Web Finger http://webfinger.org/
    • 26. tarpipe streamlines your tarpipe is one of the most Today I had a chance to updates to various social curious experiments in spend time experimenting web sites, creating simple social media that I've with tarpipe and I have to or complex workflows to seen lately. The service say that I am intrigued by update several buckets in has the potential to be the concept and impressed one fell swoop. the answer to the lament by the implementation. I first talked about in The Adam Pash looming crisis: Personal Jeff Barr lifehacker syndication overload. Amazon.com Rafe Needleman CNET news thank you share your life

    ×