Twet

3,948 views

Published on

Twet Technical Report

Published in: Education
1 Comment
0 Likes
Statistics
Notes
  • Hey, I've just received a free Minecraft Giftcode!
    You can get one too!

    >> minecraftcodes.me
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
3,948
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Twet

  1. 1. Twet Anca Antochi (aantochi@infoiasi.ro), Lucian Pricop (lucian.gabriel.pricop@gmail.com), Radu Sarghie (rsarghie@infoiasi.ro) Abstract. By now, many people have become familiar with internet search engines. Most internet users can easily find out necessary information by simply typing in a word in a search engine and reading the search results. However the web today offers us a wide variety of specialized applications that allow us to search specific domains of interest. These applications can be combined in so called mash-ups that can group the search results provided by more of these applications. Twet is an effort to combine the results of searches in the Twitter micro-blogging network and the Flickr photo sharing service. To make our application more user friendly, the search is extended to the synonyms of the search word (using the WordNet lexical database) and the result is combined with the Yahoo mapping service to show the most recent tweets and relevant photos about the topic in question. Keywords: twitter, mash-up, flickr, wordnet, yahoo maps. 1 Introduction on the used technologies 1.1 Twet - Project Description Twet is a search tool that combines the posts of the Twitter micro-blogging network, displayed on an overlay over Yahoo maps, with the Flickr Photo sharing service to give you the most relevant tweets and photos about a search term. To make the application more user-friendly, the search is extended to the synonyms of our search word, by using the WordNet lexical database. The project consists from the Twet main web application, which can be deployed on any ASP.Net enabled server, and two php web services named Twet-WordNet and Twet-Twitter,
  2. 2. 1.2 Twitter According to Wikipedia, Twitter is a free social networking and microblogging service that enables its users to send and read messages known as tweets. Tweets are text-based posts of up to 140 characters displayed on the author's profile page and delivered to the author's subscribers who are known as followers. Senders can restrict delivery to those in their circle of friends or, by default, allow open access. Users can send and receive tweets via the Twitter website, Short Message Service (SMS) or external applications. Since its creation in 2006, Twitter has gained notability and popularity worldwide. It is sometimes described as the "SMS of the Internet" since the use of Twitter's application programming interface for sending and receiving short text messages by other applications often eclipses the direct use of Twitter. Twitter posts example: U sing Twitter Twitter exposes its data via an Application Programming Interface (API). A very usefull documentation about the Twitter API can be found at http://apiwiki.twitter.com/Twitter-API-Documentation. Searching on Twitter Searches on twitter can be performed by calling the search service found at http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search. The search url can be called at http://search.twitter.com/search.format The search parameters of interest are:
  3. 3. • rpp: Optional. The number of tweets to return per page, up to a max of 100. In our case this is set to 10. Example:http://search.twitter.com/search.atom?q=devo&rpp=10 • page: Optional. The page number (starting at 1) to return, up to a max of roughly 1500 results. In our case this is always set to 1. Usage Notes: • Query strings should be URL encoded. • Queries are limited 140 URL encoded characters. • Some users may be absent from search results. • Applications must have a meaningful and unique User Agent when using this method. A HTTP Referrer is expected but not required. Search traffic that does not include a User Agent will be rate limited to fewer API calls per hour than applications including a User Agent string. Finding Out Information about Twitter users In order to find out information about twitter users we can use the service at http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-users%C2%A0show One of the following parameters is required: • id: The ID or screen name of a user. • user_id: Specfies the ID of the user to return. Helpful for disambiguating when a valid user ID is also a valid screen name. • screen_name: Specfies the screen name of the user to return. Usage Notes: • Requests for protected users without credentials from 1) the user requested or 2) a user that is following the protected user will omit the nested status element. Only publicly available data will be returned in this case.
  4. 4. 1.3 Flickr Flickr is an image and video hosting website, web services suite, and online community. In addition to being a popular website for users to share and embed personal photographs, the service is widely used by bloggers to host images that they embed in blogs and social media. As of October 2009, it claims to host more than 4 billion images. Using Flickr There are more available APIs that allow the interaction with Flickr. For the purposes of this project, the Flickr.Net library was used (which can be found at http://www.codeplex.com/FlickrNet ). To get started you will need to get an API Key for use with Flickr. You apply for new keys and manage your keys from the Your Keys section of the Flickr Services Web site at http://www.flickr.com/services/api/keys. Here is a small example on how to use the Flickr.Net in C#: PhotoSearchOptions searchOptions = new PhotoSearchOptions(); searchOptions.Tags = "Iasi"; Photos iasiPhotos = flickr.PhotosSearch(searchOptions); Flickr photo results
  5. 5. 1.4 WordNet WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and can be downloaded and used freely. The database can also be browsed online. WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University. Using Wordnet Wordnet provides an online service for searching word definitions at http://wordnetweb.princeton.edu/perl/webwn. However using the service for our project proved difficult, because of the slow speed and because the rss feed returned was difficult to parse in order to fing the synset. Instead we downloaded the Wordnet database (found at http://www.semantilog.org/wn2sql.html#synset) and exposed a php web service to perform our searches The wordnet search engine:
  6. 6. 1.5 Yahoo Maps The advent of web mapping can be regarded as a major new trend in cartography. Previously, cartography was restricted to a few companies, institutes and mapping agencies, requiring expensive and complex hard- and software as well as skilled cartographers and geomatics engineers. With web mapping, freely available mapping technologies and geodata potentially allow every skilled person to produce web maps, with expensive geodata and technical complexity Yahoo! Maps is a free online mapping portal provided by Yahoo. Using Yahoo Maps The Yahoo Ajaxs API lets developers add maps to their web sites using DHTML and JavaScript. Maps are fully embeddable and scriptable using the JavaScript programming language. Yahoo Maps has a built-in geocoder means that which we can specify a physical address or latitude/longitude coordinates for your map's location. The Api documentation can be found at http://developer.yahoo.com/maps/ajax/. In order to use Yahoo maps, an Application ID is needed. Yahoo gives for free suc Application IDs after filling in a form at In order to use Yahoo maps, an Application ID is needed. Yahoo gives for free suc Application IDs after filling in a form at https://developer.apps.yahoo.com/wsregapp/. Yahoo Maps Control:
  7. 7. 1 Twet 2.1 Project Description Twet is a mash-up that combines more technologies. It's purpose is to show relevant tweeter posts (so called “tweets”) about a topic, grouped nicely according to the Twitter user's location in Yahoo maps. To make the application more user friendly synonyms of the search word are also used (relying on the WordNet Service) and the result is combined with relevant pictures fetched from the Flickr photo sharing service. The workflow of a Twet search is the following: 1. The user types in a search word in the Twet and clicks “search” 2. Twet calls the Twet-Twitter service with the search term as a parameter 3. The Twet-Twitter Service calls the Twet-Wordnet service to get the synonyms of the word 4. Having the synonyms, the Twet Service calls Twitter to find out the last 10 post about the relevant terms 5. The Twet-Twitter service returns to Twet the last 10 posts on Twitter (along with meta-information like the Geo Tags) and the synonims 6. Twet draws an overlay on Yahoo Maps showing the desired tweets 7. Twet searches the Flickr photo sharing service for photos about the relevant search terms 8. Twet shows: ◦ The Yahoo Maps with the Twitter pushpins ◦ The list of Tweets ◦ The relevant Flickr photos
  8. 8. Twet workflow diagram: Twet-Input Twet-Twitter Service Twet-Wordnet Service Yahoo Maps Twet Flickr Twet-Output
  9. 9. 2.2 The Twet Application Twet is a Asp.Net web application that receives as input one or more search terms and displayes the last 10 tweets relevant to the search. The tweets are projected also on Yahoo maps and the result is combined with 10 relevant photos retrieved from the Flickr photo sharing service. It does so by calling the Twet-Twitter service and performing a search on Flickr. The synonim list and the tweets map:
  10. 10. The list of tweets : The Flickr photos:
  11. 11. 2.3 Asp.net vs Yahoo pipes Yahoo Pipes is a web application from Yahoo! that provides a graphical user interface for building data mashups that aggregate web feeds, web pages, and other services, creating Web-based apps from various sources, and publishing those apps. The application works by enabling users to "pipe" information from different sources and then set up rules for how that content should be modified (for example, filtering). Initially Twet started as a Yahoo Pipes mashup. However we gave up on using Pipes because it gave us too little control on string operations. Also Twitter is limiting the number of requests made by Yahoo Pipes. Yahoo Pipes Designer:
  12. 12. 2. 4 Twet-Twitter Twet-Twitter is a php web service that returns a geo-tagged RSS feed with the 10 most relevant tweets that contain a search word (or it's synonyms). This service is currently hosted at http://lucianpricop.is-a-geek.net/twitter To obtain the synset of the desired word, we simply call the Twet-Wordnet service (described later in this document). After obtaining the synonims list, the simplest method is to get the Twitter content by using the php file_get_contents function: file_get_contents('http://search.twitter.com/search.atom ?q=twitter'); However, this method requires that php configuration to have allow_url_fopen set to true, which allows reading data from remote files. Not all web hosts enable this setting, for security reasons. Also, Twitter limits the number of requests sent to their web services to less if they don't appear to originate from a browser. They check this by looking at the UserAgent header of the HTTP request. So we need a method to set this header to something eligible before sending a request to Twitter. The libcurl PHP library allows connections and communications to many different types of servers with many different types of protocols. libcurl currently supports the http, https, ftp, gopher, telnet, dict, file, and ldap protocols. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, etc. So it allows us to send a value for the UserAgent header. Here's how we use libcurl's functions to achieve our goal: $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,'http://search.twitter.com/ search.atom'); curl_setopt($ch, CURLOPT_POSTFIELDS,'lang=en&q='.$q); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2) Gecko/20100115 Firefox/3.6'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $xml = curl_exec($ch); curl_close($ch); Getting a geotagged RSS feed from Twitter Twitter has recently launched their geotag API (November 2009), but users need to update their profile in order to allow Twitter to geographically tag their posts. Most users don't know and probably don't care about this option, so they haven't opted in for this feature, so most tweets returned by a the twitter search API are not geo tagged. However, we thought the geo tag is very important and decided to work around this problem by using Twitter's user details API. This API allows us to get the
  13. 13. public details of users. These details include the textual location which can be translated to geographical altitude and longitude with the help of a nifty web service we found at http://www.geonames.org/export/geonames-search.html This service returns exactly what we need so we can add the <geo:lat> and <geo:long> tags to each tweet. The only issue is with Twitter users that don't make their location public or they write fictitious locations. There's not much we can do about it, so we decided to geo tag these users' tweets to the middle of the Atlantic Ocean :) A twitter comment rss feed entry looks like this: <entry> <id>tag:search.twitter.com,2005:8191823850</id> <published>2010-01-25T13:39:59Z</published> <link type="text/html" href="http://twitter.com/Tudoor/statuses/8191823850" rel="alternate"/> <title>came back from school looking like an popsicle :-j ... -25 degrees Celcius in iasi :-ss</title> <content type="html">came back from school looking like an popsicle :-j ... -25 degrees Celcius in &lt;b&gt;iasi&lt;/b&gt; :-ss</content> <updated>2010-01-25T13:39:59Z</updated> <link type="image/png" href="http://a3.twimg.com/profile_images/582822757/myface 2_normal.jpg" rel="image"/> <twitter:geo> </twitter:geo> <twitter:source>&lt;a href="http://echofon.com/" rel="nofollow"&gt;Echofon&lt;/a&gt;</twitter:source> <twitter:lang>en</twitter:lang> <author> <name>Tudoor (Tudor Necula)</name> <uri>http://twitter.com/Tudoor</uri> </author> <geo:lat>47.1666667</geo:lat><geo:long>27.6</geo:lo ng> </entry>
  14. 14. 2. 5 Twet-Wordnet The Twet-Wordnet web service takes a list of space separated words and returns a list of all the synonyms for all these words, including the given words in xml format. The service is hosted at http://lucianpricop.is-a-geek.net/wordnet.php? For example, accessing for example http://lucianpricop.is-a- geek.net/wordnet.php?words=bubble will return: <SYNSET> <SYN>bubble</SYN> <SYN>house of cards</SYN> <SYN>belch</SYN> <SYN>burp</SYN> <SYN>eruct</SYN> <SYN>babble</SYN> <SYN>burble</SYN> <SYN>guggle</SYN> <SYN>gurgle</SYN> <SYN>ripple</SYN> </SYNSET> At the beginning of this project, our service relied on another web service provided by Mr Bernard Bou at http://jws-champo.ac-toulouse.fr:8080/wordnet- xml/servlet . This service is called for each separate word and from the resulting xml, all the synonyms from each sense of each category of each part of speech are collected and returned. However because that service was not reliable, we chose to download the Wordnet Database from http://wordnet.princeton.edu/wordnet/download/ and implement the data extraction ourselves.
  15. 15. References 1. "Aplicaţii hibride: mashup-uri" (in Romanian), in S.Buraga (ed.), "Programarea în Web 2.0", Polirom Publishing House, Iaşi, 2007 2. “Mashing Up Feeds Using Yahoo Pipes” article from http://www.devlounge.net/code/mashing-up-feeds-using-yahoo-pipes 3. “Yahoo! Pipes: An Introduction”, by: Kim Cavanaugh from http://www.communitymx.com/content/article.cfm?cid=86E4B 4. Yahoo Maps geocoding API - http://digitalcolony.com/2007/01/using-yahoo-maps- geocoding-api-in-c.aspx 5. Twitter API Documentation - http://apiwiki.twitter.com/Twitter-API-Documentation 6. WordNet Documentation - http://wordnet.princeton.edu/wordnet/documentation/

×