Sweeper User Guide v0.3


Published on

User Manual for the Sweeper application from Swiftly.org and Ushahidi.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sweeper User Guide v0.3

  1. 1. User Guide Updated April 3, 2011Living Document for Sweeper v0.3 http://swiftly.org/userguide
  2. 2. Table of ContentsTable of Contents I. Introduction II. Using this Living Document III. About the Sweeper Application Suggested Uses As a FeedReader For Passive Data-Processing For Active Content Filtering For Real-time Social Media Curation As a Vertical Content Dashboard Terminology IV. Explaining the Sweeper UI Analytic Dashboard Main Content Window Admin Panel View Tabs Filter Panel Refresh Staging Area Rating Panel Content Items V. Overview of Plugins Duplicate Content Filter Google Language Services Geo-Location (Yahoo) Tagging Ushahidi Push Tag Clustering Annotations* Quiver/Bookmarking* VI. Adding Sources Email (IMAP) Email (Gmail) FrontlineSMS News & Blog Search RSS/ATOM Flickr SMS Gateways Twitter
  3. 3. I. IntroductionThanks for using the Sweeper application! Sweeper is meant to be fairly intuitive but we’re well aware thatsometimes it’s a little overwhelming at first to get started and knowing what’s possible. In this guide wewill walk you though using Sweeper and a handful of the native plugins. This is not a guide for installingit (for that look here), rather this guide will walk you through use of the Sweeper software and the variousplugins for it. If you are a developer seeking information on how to develop plugins, parsers or othermodules for Sweeper and other SwiftRiver applications, click here.
  4. 4. II. Using this Living DocumentBecause Sweeper is an open-source product, who’s code and feature-set changes quite frequently, thisuser guide is a living document that serves only as a snapshot of what’s possible at the time it was lastupdated. We invite you to revisit this link often. If you decide to print it, just be aware that as soon as it’stransferred from bits to pulp, it’s essentially become outdated.Likewise, any copy of this document that is distributed in PDF form, DOC form, or FLV form, thoseversions too are likely outdated. To ensure you have the latest version, it can always be found at - http://swiftly.org/userguide/
  5. 5. III. About the Sweeper ApplicationSweeper is an application that focuses on the aggregation, curation and filtering of real-time content.It assumes the user knows exactly what sources they are tracking but needs an application to helpthem prioritize their attention. Here is a comparison. Sweeper is sort of like an open source version ofTweetDeck, or to use a Google analogy: Google Reader. The user defines a number of sources to trackand Sweeper offers a number of ways for filtering and viewing that collected content.Suggested UsesWhat can Sweeper be used for? A number of things but here’s a few ideas... As a FeedReader Sweeper was designed for collecting large amounts of disparate real-time data and sweeping through it quickly and efficiently, while also doing things to that content. So there is an emphasis on speed and summation of large datasets, allowing the user to decide upon where to spend his or her time to delve deeper. As mentioned in the examples above, one might consider using Sweeper as a substitute for a traditional feed-reader. However, unlike most feed-readers there are no restrictions on the type of data that can be aggregated, and there’s smart triggers applied to data going out. ex. If I perform this function, content is affected in this way. This functionality can be useful for setting up really advanced conditional taskingwhich we’ll cover later. For Passive Data-Processing Sweeper can also be configured to be a passive filter for data, meaning you can set it to aggregate content, then automatically perform certain tasks around that. ex. Aggregate all tweets from #hashtag tagged in the state of Maine and send only that data to another platform. When used in this way, Sweeper essentially becomes a smart cron tool equipped with geo- tagging, natural language processing and other power contexual features. For Active Content Filtering Users are also provided a number of utilities for quickly searching through content. Clicking on a selection of tags allows the user to see content only selecting those tags. The cluster panel allows content to be clustered around other content in various channels that are similar. The user can also sort by assigned scores (which can represent the favor they might have for some types of content over others) in any variation between 1 and 100. ex. show me only the content with a score of 40 or above; or only content between 20 and 60. For Real-time Social Media Curation Sweeper can be used for real-time media curation across channels (Blogs, News, RSS/ATOM, Twitter, SMS, Email) and across over 50 languages. For a journalist attempting to collect data
  6. 6. that’s rapidly unfolding across social-media, this can save potentially unprecedented amounts of time. Rather that opening 50 different windows for different apps, the Sweeper application can be used to mine and add context to disparate content, completely at the users whim. Perhaps even more interestingly, all this aggregated data can be annotated, mapped, shared or exported in a number of ways after it’s been structured as the user sees fit. As a Vertical Content Dashboard Perhaps you have a need to know what’s going across various industries at all times. You could enter the feeds of several well known bloggers, the @twitternames of thought leaders in that industry, a public facing email address you control like sports@mynewsite.com, a public facing shortcode (ex. 6060). That might just be your sports page. But when you replicate that experience multiple times across Entertainment, World News, Food, Lifestyle etc. you end up with an equally rich immerse real-time data-mining tool across all those interests.TerminologyBefore we continue, it will help if you have a basic understanding of the terminology we use to discuss theapplication.Sweeper (capital ‘S’) - the name of a SwiftRiver application for aggregating and processing feeds ofcontentsweeper (lowercase ‘s’) - generally, one who performs the function of sweeping through feeds ofcontent. However, in the Sweeper application the user role of sweeper is assigned to users who can edittags and process content but who don’t have administrative rights to the application.sweep - to process datachannel - the distribution type used to deliver content. Twitter, Email, RSS/ATOM, SMS are all channels.source - the place (or person) from which content originates. a persons @twittername, email address,blog or web url, or phone-number would all be considered sources. Several sources may be collected toreference a single identity ex. this blog, this url, this phone number all belong to the same personcontent item - a single item of content collected from a feed, regardless of the channel it came in on orthe source it came fromtag - a layer of taxonomy applied to all contentlat/lon - geospatial coordinates; short for latitude and longitudeveracity - more accurately the subjective favor the user (or users) has for content. The baseline of favorexpressed for certain types of content is uses as a building block for a score applied to content. Thisscore is then used both for prioritizing sources and for recommending other content the user or users mayfavor.cluster - a collection of content items deemed to be statistically similar based on tagseditors - editors don’t have full administrative rights to the application but they can perform tasks thatsweepers can’t.turbine - another word for plugins for SwiftRiver applicationsimpulse turbine - plugins that pre-process content (before the application receives it). Impulse Turbineplugins affect how data is structured as part of the Swift object module.reactor turbine - plugins that process content based on human interaction or assigned logic (afterthe application has received it). Reactor Turbine plugins can be used to take structured data and do
  7. 7. something with it.parsers - on the application architecture level parsers are modules that can be written to create newsourcestrusted source - applies a default score of 100 to a source allowing the user to vote against a high-scoreas the default. ex. you have my trust now but could lose it over-time
  8. 8. IV. Explaining the Sweeper UISo now that we’ve got the basics we can walk you through the Sweeper user interface, it’s basic featuresand functions. At first look the application can be a little intimidating so hopefully this guide takes theedge off (like a martini!).
  9. 9. Analytic DashboardThis dashboard offers a quick survey of the content being collected by Sweeper. Where is data mostlybeing collected from? How much content in total? Howe much from each channel? The charts aredynamic and update with each use of the application.
  10. 10. Main Content WindowBelow you see the main content display window. This is where aggregated content can be viewed.
  11. 11. Admin PanelThis area contains four tabs. Login, Impulse Turbines, Reactor Turbines, Sources, Add User Login - as you might expect, this area allows users to login to the application Impulse Turbine - for enabling or disabling impulse turbine plugins Reactor Turbine - for enabling or disabling reactor turbine plugins Sources - this is the area where one can add sources to aggregate into Sweeper Users - area for adding users and assigning their administrative rights
  12. 12. View TabsThis area contains several tabs for altering the view of the main content window. The titles are fairly self-explanatory. Dashboard, New content, Accurate, Inaccurate, Crosstalk, Irrelevant Dashboard - contains a collection of charts plotting various aspects of the content being collected New content - for viewing new content as it’s being collected Accurate - shows all content voted up Inaccurate - shows all content voted down Crosstalk - shows content that is completely off-topic Irrelevant - shows content that is on-topic but not relevant to the user’s specific needs
  13. 13. Filter PanelFilters for changing the view of the main content window. Veracity Slider - allows the user to set a range of anything between 1 and 100 to view content by assigned score Channels - view only the content that came in on a particular channel Tags - view only the content containing a selection of tagsRefresh Staging AreaReveals how much content has been aggregated since the main content window was last refreshed.
  14. 14. Rating PanelThe upper left part of the Rating Panel is for quickly determining information about content. Is thisa ‘trusted’ source or has it been rated as trusted by the people within your bounded (or unbounded) groupof users?The upper right quadrant shows a score that represents the favor the user or their community has for theassociated source.In the lower quadrant we have four buttons here is what they essentially do: Green (Up) - expresses favor for a content item while positively affecting it’s sources score so that in the future content from the same source will be prioritized. Red (Down) - expresses disapproval for a content item while negatively affecting it’s sources score so that in the future content from the same source will be deprioritized Crosstalk - expresses that this content is not relevant because it’s essentially been collected by mistake and that it’s not useful. Removes it from the main view without negatively affecting the source score. Irrelevant - expresses that this content is not germane to the task the user is trying to perform and more importantly, is somehow damaging or distracting. Removes the content from the main view with negatively affecting the source score.It’s important to note that these votes whether up or down are not the only things being factored into thescoring of content. We also factor in a number of things like the tag profile of content, the ratings of theindividuals users rating this individual, and other factors. For an in-depth explanation see the RiverIDSystem Guide.
  15. 15. Content ItemsContent items are divided into three sub-sections: the Header, the Body and the Footer.In the Header you’ll find an icon denoting what channel this content came in on: Twitter, Email, SMS, orRSS/Atom. Clicking this icon will reveal more:A pop-up display reveals information about the source and the content itself: Source - the source of the content (a Twitter @name, email address, url or phone number) Channel - the channel the content came in on (Twitter, Email, SMS, or RSS/Atom) Source Score - the trust score associated with this source Link - hyperlink to the original content
  16. 16. In the Body you’ll find a portion of the message (from Twitter and SMS) or headline/subject (Articles,Blogs, Email)In the Footer you’ll find tags which add a layer of taxonomy to the content. You can quickly find othercontent like this particular content item by clicking on the tags themselves. Users can also add their owntags*, edit tags* or delete tags to help the system improve**.* Adding tags and editing tags is not possible in the v0.3.0 of Sweeper UI. However a slight modification of the code exposes thisfeature and makes it available.** There is an active learning element of our Tagging API that allows the system to learn from user feedback that will be availablesoon. You can read more about this in the section on Impulse Reactor Plugins.
  17. 17. V. Overview of PluginsThere are a few plugins that ship with Sweeper and that are either enabled by default or commonly used.There are way too many to list here so in this section we’ll explain what a few of the available plugins areand what they are used for.You can always find more plugins for Swiftly applications at http://plugins.swiftly.orgDuplicate Content FilterWhen activated, this plugin passes all content through the Duplication Filter API in the Swift Web Servicestack, effectively removing all duplicate content (like retweets) from a feed.Google Language ServicesWhen activated, this plugin passes all content through the Google Translate API. Google Translate willautomatically detect what language the content is in, translate it and send it back. This allows you toaggregate content in multiple languages but only see the resulting translated, English content! This is ahuge time saver when doing international research.But how do you know what content has been translated. When activated, additional info in the contentitem’s header will let the user know what has been translated, and from what language. See the exampleabove.If you expect large amounts of data you may want to opt for the Google Enterprise Language Service
  18. 18. plugin instead. With this plugin the amount of content that can be translated is increased significantly.It requires an API key from Google. If you need help getting Enterprise level access, contact us atsupport@swiftly.orgGeo-Location (Yahoo)When activated, this plugin passes all content through the Yahoo Placemaker API where we try to detecta location where the content is likely to have originated from. We then apply lat/lon coordinates to thecontent that are then stored as part of the content meta info. When passed to other systems, this lat/loninfo can be used for geo-spatial reference.To use this service, you’ll need to acquire a Yahoo Placemaker API key from Yahoo. If you need helpgetting Enterprise level access, contact us at support@swiftly.orgTaggingWhen activated, all content passing through Sweeper will be tagged by our natural language processingAPI. Essentially this services tries to extract what it thinks are the active keywords being used, and usesthat to help the user automatically sort content.Tags are very important to SwiftRiver and we take a dual taxonomic and folksonomic approach in ourapplications. Meaning, although these tags are machine generated, they can be edited and improvedupon by humans which in turns helps to teach the algorithm how to tag content better.Ushahidi PushFor users of Ushahidi or Crowdmap. This will take any content voted up in the Ratings panel andautomatically plot it on a designated Ushahidi deployment map as an approved report. This is asignificant time saver for large groups who want to use Sweeper to curate data, but use Ushahidi orCrowdmap to visualize it.
  19. 19. Users will need to enter and API key for an Ushahidi deployment that they have administrative rights to.ex. http://xxx.xxx.xx.xxx/ushahidi/There are many variants of this plugin. One is called Ushahidi Passive Push and essentially it turnsSweeper into a cron suite where content is automatically aggregated, structured, and passed along toUshahidi...mostly without any human operators!Tag ClusteringWhen activated, this plugin allows the user to view content similar to any particular content item. Theclustering is done by using a statistical profile of the associated Tags for proximity matching. This givesthe user more control over alternative recommendation methods, because it can factor in the users owntagging methods. For instance if I use unique identifiers or words unique to my organization, they too canbe used as part of the proximity matching algorithm!Annotations*Annotations offers the ability to annotate any content item. This can be used to leave individual notes forreference, or to collaboratively converse around content with your team.Quiver/Bookmarking*Quiver is a bookmarklet that allows the ability to quickly collect content from around the web and post itto your Sweeper deployment (effectively adding them to your quiver). This can be useful for individuallycollecting research, or if you have teams of contributors actively recommending content for you to thenapply all our contextual APIs to.* These features will ship with the forthcoming release of Sweeper.
  20. 20. VI. Adding SourcesTo begin using Sweeper at all, one must begin aggregating from predefined sources. Essentially thisis where you inform the system what you want to track. Sweeper currently only accepts inputs that areupdated streams of data - feeds - in XML/ATOM/RSS or JSON format.To get any content we don’t currently accept into Sweeper, all one would need to do is write a parser, afew lines of code that tell the application how to structure data coming from that particular feed.The types of content natively supported are IMAP, Gmail, FrontlineSMS, GoogleNews, any RSS orAtom feed, Flickr, other SMS gateways and Twitter.Email (IMAP)Sweeper will accept the IMAP details of any email account and begin pulling in content allowing you toaggregate, translate, tag and cluster your email.Email (Gmail)
  21. 21. Sweeper supports aggregating email from any Gmail account, pulling in content and allowing you toaggregate, translate, tag and cluster your email. Although Gmail also supports IMAP, the native Gmailaggregation is recommend.FrontlineSMSIn combination with FrontlineSMS, Sweeper can become a powerful SMS curation service thataggregates real-time content (SMS) even if there is no internet connection! There are two ways ofintegrating FrontlineSMS with Sweeper. Remote and Local.
  22. 22. Is for users who have access to some type of network, either it’s via the Internet or just a LAN. Simplyenter the details of the FrontlineSMS deployment you want to pull data from. You will need to use thisin combination with the FrontlineFetch go-between servlet which can be downloaded from http://plugins.swiftly.org/?p=51.
  23. 23. The local option requires that Sweeper deployment and Frontline:SMS be installed on the same machineor server. This allows the Sweeper application to pull directly from the FSMS database and will work evenif there is no Internet.News & Blog SearchThis source module allows you to set up a keyword search, returning real-time search results fromGoogle News, Posterous, Blogger and Wordpress.com. The results will appear in the main content view,translated if necessary.RSS/ATOM
  24. 24. Self-explanatory, simply enter the URL of a feed in the RSS, ATOM 1.0 or ATOM 2.0 service andSweeper will begin aggregating that content.FlickrThis service allows the user to aggregate content from the photo-sharing service FlickR.The options are fairly simple. Tag Search will return results aggregated from Flickr based on a searchusing a specific keyword ex. cats, dogs, Eiffel Tower. Tag Search with Location will only return geo-tagged results, great when used in combination with a mapping platform like Crowdmap. Follow User isfor only returning the results from a specific user account.SMS Gateways
  25. 25. We’ve included a generic SMS gateway aggregator. It’s set up to read from the HTTP posts commonlyused by services that don’t have APIs. However, it’s there largely to fork and modify - a head start onintegrating your own SMS service.TwitterCulling content from Twitter is easy. There are two options Search and Follow User.With Search, the user enters the name for a search (the name that has relevance to you) followed by theterm(s) that they would like to search. These can be common words or hashtags. ex. ‘My Twitter Search’
  26. 26. and ‘#searchword”. There is no limit to the number of search queries one can have, however the return ofresults is limited by your individual access to the Twitter search API. If you’d like to increase this accesscontact Twitter to get white-listed or contact support@swiftly.org.Note on Sources and Search: When using a Twitter search please note that the search itself is not asource. In the Swiftly eco-system, content producers are sources. This means that we will identify all theindividual content producers and help you keep track of them. This allows one to monitor conversationsaround keywords that might lead them to great content producers.With Follow User, the user can enter a unique name for the Twitter handle they want to follow along withthe actual @name on Twitter. For example ‘Bob Smith, Rwanda’ alongside ‘@bobsmith’. This is helpfulbecause it perhaps allows you to leave notes about who you may be following for yourself, or your teammembers.