Creating streams with DataSift


Published on

This slide deck runs through how to create DataSift Streams and the FSDL.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Creating streams with DataSift

  1. 1. Creating Streams with DataSift<br />
  2. 2. Creating a Stream: Workflow<br />
  3. 3. Creating a Stream: Specification<br />Work out what you want your stream to do<br />What do you want the elements to contain?<br />What sources do you want the data to come from?<br />What is your budget for data acquisition?<br />Who is this data for?<br />
  4. 4. Creating a Stream: Definition<br />Write a Stream Definition that executes your specification<br />
  5. 5. Creating a Stream: Filtered Data<br />Retrieve the data that is filtered by your stream<br />JSON API<br />HTTP Streaming<br />WebSockets Streaming<br />RSS<br />
  6. 6. Creating a Stream in DataSift<br /> 1. Select the Create Stream button on any page on DataSift<br />
  7. 7. Creating a Stream in DataSift<br />2. Fill in the title, description, and tags for your Stream<br />The Title and Description will be shown next to your Stream<br />The Tags will be used for search and categorisation of your Stream<br />Enabling the Private checkbox will make your Stream visible only to you<br />
  8. 8. Creating a Stream in DataSift<br />3. Create your first stream definition<br />This is the Stream Editor<br />There is a default stream definition already inserted for you<br />Why not try changing “hello world” to a different value?<br />e.g. interaction.content contains “cat”<br />
  9. 9. Creating a Stream in DataSift<br />4. Hit the Save button<br />Your Stream is now saved<br />You can use the breadcrumbs to go back to see a live preview of the results<br />
  10. 10. FSDL: Filtered Stream Definition Language<br />FSDL is the language used to write Stream Definitions for DataSift<br />The language takes the following basic format:<br /><term> <logical operator> <term> <logical operator><br />There must be a minimum of 1 term in a definition.<br />All terms must be separated by logical operators.<br />A logical operator is either “and” or “or”.<br />
  11. 11. FSDL: Nested Rule<br />On the previous slide, we had this definition outline:<br /><term> <logical operator> <term> <logical operator><br />The term can be either one of a “nested rule” or a “predicate”.<br />A nested rule is a method of including the result of another stream within the logic of this one.<br />The syntax for a nested rule is:<br />rule “<stream identifier>”<br />Where the stream identifier is a 32-character alphanumeric string obtainable from the stream you wish to include’s page on DataSift, or through the API.<br />
  12. 12. FSDL: Nested Rule Example<br />This is an example of a simple FSDL definition:<br />interaction.content contains “justinbieber”<br />The Stream Identifier for this definition is 4e8e6772337d0b993391ee6417171b79. The stream will contain all content which contains “justinbieber” in its content.<br />We can create another rule to filter this down further, using the nested rule syntax:<br />rule “4e8e6772337d0b993391ee6417171b79” and language.tag == “en”<br />This performs the same filtering as the first stream, with the addition of only including content determined to be in English using the language.tag == “en” predicate.<br />In this case, the logical operator separating the two terms is “and”.<br />
  13. 13. FSDL: Predicates<br />Predicates are formed of 3 items, a target, operator and argument, in the following format:<br /><target> <operator> <argument><br />In the previous example, we saw this predicate used to filter the results of another rule:<br />language.tag == “en”<br />In this example, the target is “language.tag”; the operator is “==“ (equals); and the argument is “en”.<br />There is a long list of targets, operators, and the arguments they require on the DataSift Support Documentation.<br />
  14. 14. FSDL: Example Predicates<br />The following are some examples of some simple predicates:<br />interaction.content contains “#rdgtweetup”<br />twitter.user.friends_count >= 1000<br />interaction.content contains_word “net”<br />interaction.geo exists<br />author.username in "dtsn,nickhalstead,chris_alexander,datasift"<br />
  15. 15. FSDL: Example Definitions<br />Here are examples of more complex definitions composed of multiple terms:<br />(interaction.contentcontains "Justin Bieber« <br />OR interaction.contentcontains "Justin Beiber")<br />(interaction.content contains "Nokia"<br />OR interaction.content contains "Motorola"<br />OR interaction.content contains "Palm")<br />AND interaction.content contains "phone“<br />interaction.content contains "#rdgfestival"<br />OR interaction.content contains "#readingfestival" <br />OR rule "4315e367618830de6224c479f35db4ca"<br />
  16. 16. API Calls<br />API calls are available to perform most of the DataSift functionality.<br />All of these API calls are available through a semi-RESTful interface, in a similar way to the Twitter API.<br />Data formats supported include JSON, JSONP, XML and PHP (serialized).<br />Each call is fully documented on the DataSift Support site.<br />
  17. 17. Retrieving Stream Data<br />Once you have configured your stream with a definition and verified it is correct, you can connect to your stream through a number of methods:<br />The JSON API is simple and similar to how you would access Twitter Search.<br />The HTTP Stream is similar to the Twitter firehose, giving a constant stream of data through a single connection. WebSockets is similar to this but meant for client-side connections through supported web browsers.<br />RSS is also available, recommended for lower volume feeds only.<br />All services are fully documented on the DataSift Support site.<br />
  18. 18. Questions<br />You can get more help, support, examples and user content on the DataSift Support website:<br /><br />You can also ask us on Twitter:<br />@datasift<br />