Small Pieces Loosely       Joined        #cgn13
or...A practical example of processing real-time   data with a distributed agent network            (Warning: does not con...
Red Gate
12th October 2011
eMail Marketing
Mailchimp webhook"type": "subscribe","fired_at": "2009-03-26 21:35:57","data[id]": "8a25ff1d98","data[list_id]": "a6b5da10...
Pump the callbacks into a message bus...
Messaging
mailchimp-pump.php$json = json_encode($_POST);$msg = new AMQPMessage($json);$channel->basic_publish($msg, mailchimp, "mora...
I’d like to watch the stream on IRC...
ValveSubscribe to mailchimp exchangemorat.campaign.mailchimp.#Translate to plain english for IRCInject into irc exchange w...
mailchimp-irc-valve.rbcase record[type]when subscribe    output :irc, "#{record[data][merges][FNAME]} #{record[data]    [m...
Create a Sink to send the messages to IRC...
irc-sink.pl$q = $amq->channel(1)->queue(morat.irc. . $channel , { passive => 0,durable => 0, auto_delete => 1, exclusive =...
Where have we got to?Pump: Mailchimp webhook (HTTP POST) >morat.[campaign].mailchimp.[type] (JSON)Valve: morat.campaign.ma...
That’s cool, but hey it would be great to see#campaign tweets as well...
twitter-search-pump.rbTweetStream::Client.new.track(keywords.split(,)) do |status|  keywords.split(,).each do |searchterm|...
twitter-irc-valve.rbcase routing_keywhen morat.twitter.@neildavidson.list.redgaters     output :irc, "RG chatter: #{record...
I feel the urge to graph...
Thanks @garethr
ValveSubscribe to mailchimp exchange morat.[campaign].mailchimp.#Translate to Graphite format: [value] [timestamp]Inject i...
But let’s make it cool...
Complex Event Processing
mailchimp-graphite-                valve.rb    %w{ subscribe unsubscribe campaign }.each do |action|  [ 10 sec, 1 min, 5 m...
Why use CEP?# find the sum of retweets of last 5 tweets which saw more than 10 retweetsSELECT sum(retweets) from TweetEven...
Is there really a correlation?
Statistical Computing
ValveGrab raw data for window from graphite via RESTCreate scatter graph using R and calculatecorrelationInject correlatio...
twitter-correlation-valve.rb      require rsruby...r.jpeg(filename)r.assign(xs, data[1])r.assign(ys, data[2])fit = r.lm(ys...
Lets add some realtime visualisation...
Websockets
ValveSubscribe to twitter exchangemorat.twitter.search.[keyword]Extract adjectives using entaggerInject adjectives into tw...
twitter-sentiment-valve.rb      require engtagger...log.debug "Received tweet from #{record[user][screen_name]} on#{routin...
SinkSubscribe to twitter exchangemorat.twitter.search.[keyword].adjectivesUse node.js and Socket.IO to send data to webcli...
twitter-sentiment-sink.js    io.sockets.on(connection, function (socket) {     amqp_connection.on(ready, function () {    ...
twitter-sentiment-sink.html     <H1>Twitter Sentiment</H1>  <div id="container">    <canvas id="twitter-sentiment-sink" da...
@ennui2342www.morat.co.uk polis.ecafe.org
Small pieces loosely joined
Small pieces loosely joined
Small pieces loosely joined
Small pieces loosely joined
Upcoming SlideShare
Loading in …5
×

Small pieces loosely joined

877 views

Published on

An experiment in a distributed approach to processing the real-time data generated by a large scale social media campaign. Presented at Cambridge Geek Nights 13.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
877
On SlideShare
0
From Embeds
0
Number of Embeds
62
Actions
Shares
0
Downloads
5
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Take the adjectives and store a running total in Redis to create long timeline tag clouds\n Pull out @replies and RT&amp;#x2019;s and throw them into Neo4j - a graph database for post-competition analysis\n Hook an Arduino up to IRC to receive Mailchimp subscriptions and create a physical visualisation in the office (e.g. glow ball)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Small pieces loosely joined

    1. 1. Small Pieces Loosely Joined #cgn13
    2. 2. or...A practical example of processing real-time data with a distributed agent network (Warning: does not contain real code)
    3. 3. Red Gate
    4. 4. 12th October 2011
    5. 5. eMail Marketing
    6. 6. Mailchimp webhook"type": "subscribe","fired_at": "2009-03-26 21:35:57","data[id]": "8a25ff1d98","data[list_id]": "a6b5da1054","data[email]": "api@mailchimp.com","data[email_type]": "html","data[merges][EMAIL]": "api@mailchimp.com","data[merges][FNAME]": "MailChimp","data[merges][LNAME]": "API","data[merges][INTERESTS]": "Group1,Group2","data[ip_opt]": "10.20.10.30","data[ip_signup]": "10.20.10.30"
    7. 7. Pump the callbacks into a message bus...
    8. 8. Messaging
    9. 9. mailchimp-pump.php$json = json_encode($_POST);$msg = new AMQPMessage($json);$channel->basic_publish($msg, mailchimp, "morat.campaign.mailchimp.".$_POST[type]);
    10. 10. I’d like to watch the stream on IRC...
    11. 11. ValveSubscribe to mailchimp exchangemorat.campaign.mailchimp.#Translate to plain english for IRCInject into irc exchange with routing key morat.irc.[channel]
    12. 12. mailchimp-irc-valve.rbcase record[type]when subscribe output :irc, "#{record[data][merges][FNAME]} #{record[data] [merges][LNAME]} has joined the list"when unsubscribe output :irc, "#{record[data][merges][FNAME]} #{record[data] [merges][LNAME]} has left the list"...
    13. 13. Create a Sink to send the messages to IRC...
    14. 14. irc-sink.pl$q = $amq->channel(1)->queue(morat.irc. . $channel , { passive => 0,durable => 0, auto_delete => 1, exclusive => 0, })->subscribe( sub { my ($payload, $meta) = @_; my ($channel) = $meta->{queue} =~ /.([^.]+)$/; $irc->yield(privmsg, #.$channel, GREEN.$payload);});
    15. 15. Where have we got to?Pump: Mailchimp webhook (HTTP POST) >morat.[campaign].mailchimp.[type] (JSON)Valve: morat.campaign.mailchimp.[type] (JSON) >morat.irc.[campaign] (Text)Sink: morat.irc.[campaign] (Text) > IRC server
    16. 16. That’s cool, but hey it would be great to see#campaign tweets as well...
    17. 17. twitter-search-pump.rbTweetStream::Client.new.track(keywords.split(,)) do |status| keywords.split(,).each do |searchterm| if status.text.match(searchterm) searchterm.sub!( ,) searchterm.sub!(#,) log.debug "Sending: #{status.user.screen_name} :: #{status.text} ::morat.twitter.search.#{searchterm}" broker.exchange.publish JSON.generate(status), :routing_key =>"morat.twitter.search.#{searchterm}" end endend
    18. 18. twitter-irc-valve.rbcase routing_keywhen morat.twitter.@neildavidson.list.redgaters output :irc, "RG chatter: #{record[user][screen_name]} tweeted: #{record[text]}", :routing_key => "morat.irc.redgaters"else searchterm = routing_key.match(/morat.twitter.search.(.+)/)[1] output :irc, "#{record[user][screen_name]} tweeted: #{record[text]}", :routing_key => "morat.irc.#{searchterm}"
    19. 19. I feel the urge to graph...
    20. 20. Thanks @garethr
    21. 21. ValveSubscribe to mailchimp exchange morat.[campaign].mailchimp.#Translate to Graphite format: [value] [timestamp]Inject into graphite exchange with routing keybased on sample window: 10sec.[campaign].mailchimp.[action].count
    22. 22. But let’s make it cool...
    23. 23. Complex Event Processing
    24. 24. mailchimp-graphite- valve.rb %w{ subscribe unsubscribe campaign }.each do |action| [ 10 sec, 1 min, 5 min, 15 min ].each do |window| valve.register "SELECT count(*) fromMailchimpEvent(type=#{action}).win:time_batch(#{window})", ( Listener.new(valve) do |agent, event| valve.output :graphite, "#{event.get(count(*))}", :routing_key =>window.delete( ) + ".morat.#{valve.application}.mailchimp.#{action}" end ) endend
    25. 25. Why use CEP?# find the sum of retweets of last 5 tweets which saw more than 10 retweetsSELECT sum(retweets) from TweetEvent(retweets >= 10).win:length(5)# find max, min and average number of retweets for a sliding 60 second window of timeSELECT max(retweets), min(retweets), avg(retweets) FROM TweetEvent.win:time(60 sec)# compute number of retweets for all tweets in 10 second batchesSELECT sum(retweets) from TweetEvent.win:time_batch(10 sec)# number of retweets, grouped by timezone, buffered in 10 second incrementsSELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone# compute the sum of retweets in sliding 60 second window, and emit count every 30 eventsSELECT sum(retweets) from TweetEvent.win:time(60 sec) output snapshot every 30 events# every 10 seconds, report timezones which accumulated more than 10 retweetsSELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone havingsum(retweets) > 10 Courtesy @igrigorik http://www.igvita.com/2011/05/27/streamsql-event-processing-with-esper/
    26. 26. Is there really a correlation?
    27. 27. Statistical Computing
    28. 28. ValveGrab raw data for window from graphite via RESTCreate scatter graph using R and calculatecorrelationInject correlation into graphite exchange
    29. 29. twitter-correlation-valve.rb require rsruby...r.jpeg(filename)r.assign(xs, data[1])r.assign(ys, data[2])fit = r.lm(ys ~ xs)r.plot({ x => data[1], y => data[2], xlab => label[1], ylab => label[2]})cor = r.cor(data[1],data[2]).to_sr.title("Correlation: " + cor)r.abline(fit[coefficients][(Intercept)],fit[coefficients][xs])r.eval_R("dev.off()")
    30. 30. Lets add some realtime visualisation...
    31. 31. Websockets
    32. 32. ValveSubscribe to twitter exchangemorat.twitter.search.[keyword]Extract adjectives using entaggerInject adjectives into twitter exchange with routingkey morat.twitter.search.[keyword].adjectives as:[adjective] [count]
    33. 33. twitter-sentiment-valve.rb require engtagger...log.debug "Received tweet from #{record[user][screen_name]} on#{routing_key}"adjectives = @parser.add_tags(record[text]).scan(EngTagger::ADJ).map do |n| @parser.strip_tags(n)endret = Hash.new(0)adjectives.each do |n| n = @parser.stem(n) ret[n] += 1 unless n =~ /As*z/end
    34. 34. SinkSubscribe to twitter exchangemorat.twitter.search.[keyword].adjectivesUse node.js and Socket.IO to send data to webclient via WebsocketsVisualise with processing.js in web browser
    35. 35. twitter-sentiment-sink.js io.sockets.on(connection, function (socket) { amqp_connection.on(ready, function () { var queue = amqp_connection.queue(); exchange = amqp_connection.exchange(twitter, { type: topic,passive: false, durable: true, autoDelete: true}, function (exchange) { queue.bind(exchange,routing_key); queue.subscribe(function (message) { socket.emit(data, { text: message.data.toString() }); }); }); });});
    36. 36. twitter-sentiment-sink.html <H1>Twitter Sentiment</H1> <div id="container"> <canvas id="twitter-sentiment-sink" data-processing-sources="twitter-sentiment-sink.pde" WIDTH=800 HEIGHT=600></canvas> </div> <script src="/socket.io/socket.io.js"></script> <script type="text/javascript"> var socket = io.connect(http://localhost); socket.on(data, function (data) { var pjs = Processing.getInstanceById(twitter-sentiment-sink); pjs.addDatum(data.text.split( )[0]); }); </script>
    37. 37. @ennui2342www.morat.co.uk polis.ecafe.org

    ×