• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Small pieces loosely joined
 

Small pieces loosely joined

on

  • 722 views

An experiment in a distributed approach to processing the real-time data generated by a large scale social media campaign. Presented at Cambridge Geek Nights 13.

An experiment in a distributed approach to processing the real-time data generated by a large scale social media campaign. Presented at Cambridge Geek Nights 13.

Statistics

Views

Total Views
722
Views on SlideShare
688
Embed Views
34

Actions

Likes
1
Downloads
3
Comments
0

2 Embeds 34

http://lanyrd.com 32
http://paper.li 2

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Take the adjectives and store a running total in Redis to create long timeline tag clouds\n Pull out @replies and RT’s and throw them into Neo4j - a graph database for post-competition analysis\n Hook an Arduino up to IRC to receive Mailchimp subscriptions and create a physical visualisation in the office (e.g. glow ball)\n
  • \n
  • \n
  • \n
  • \n
  • \n

Small pieces loosely joined Small pieces loosely joined Presentation Transcript

  • Small Pieces Loosely Joined #cgn13
  • or...A practical example of processing real-time data with a distributed agent network (Warning: does not contain real code)
  • Red Gate
  • 12th October 2011
  • eMail Marketing
  • Mailchimp webhook"type": "subscribe","fired_at": "2009-03-26 21:35:57","data[id]": "8a25ff1d98","data[list_id]": "a6b5da1054","data[email]": "api@mailchimp.com","data[email_type]": "html","data[merges][EMAIL]": "api@mailchimp.com","data[merges][FNAME]": "MailChimp","data[merges][LNAME]": "API","data[merges][INTERESTS]": "Group1,Group2","data[ip_opt]": "10.20.10.30","data[ip_signup]": "10.20.10.30"
  • Pump the callbacks into a message bus...
  • Messaging
  • mailchimp-pump.php$json = json_encode($_POST);$msg = new AMQPMessage($json);$channel->basic_publish($msg, mailchimp, "morat.campaign.mailchimp.".$_POST[type]);
  • I’d like to watch the stream on IRC...
  • ValveSubscribe to mailchimp exchangemorat.campaign.mailchimp.#Translate to plain english for IRCInject into irc exchange with routing key morat.irc.[channel]
  • mailchimp-irc-valve.rbcase record[type]when subscribe output :irc, "#{record[data][merges][FNAME]} #{record[data] [merges][LNAME]} has joined the list"when unsubscribe output :irc, "#{record[data][merges][FNAME]} #{record[data] [merges][LNAME]} has left the list"...
  • Create a Sink to send the messages to IRC...
  • irc-sink.pl$q = $amq->channel(1)->queue(morat.irc. . $channel , { passive => 0,durable => 0, auto_delete => 1, exclusive => 0, })->subscribe( sub { my ($payload, $meta) = @_; my ($channel) = $meta->{queue} =~ /.([^.]+)$/; $irc->yield(privmsg, #.$channel, GREEN.$payload);});
  • Where have we got to?Pump: Mailchimp webhook (HTTP POST) >morat.[campaign].mailchimp.[type] (JSON)Valve: morat.campaign.mailchimp.[type] (JSON) >morat.irc.[campaign] (Text)Sink: morat.irc.[campaign] (Text) > IRC server
  • That’s cool, but hey it would be great to see#campaign tweets as well...
  • twitter-search-pump.rbTweetStream::Client.new.track(keywords.split(,)) do |status| keywords.split(,).each do |searchterm| if status.text.match(searchterm) searchterm.sub!( ,) searchterm.sub!(#,) log.debug "Sending: #{status.user.screen_name} :: #{status.text} ::morat.twitter.search.#{searchterm}" broker.exchange.publish JSON.generate(status), :routing_key =>"morat.twitter.search.#{searchterm}" end endend
  • twitter-irc-valve.rbcase routing_keywhen morat.twitter.@neildavidson.list.redgaters output :irc, "RG chatter: #{record[user][screen_name]} tweeted: #{record[text]}", :routing_key => "morat.irc.redgaters"else searchterm = routing_key.match(/morat.twitter.search.(.+)/)[1] output :irc, "#{record[user][screen_name]} tweeted: #{record[text]}", :routing_key => "morat.irc.#{searchterm}"
  • I feel the urge to graph...
  • Thanks @garethr
  • ValveSubscribe to mailchimp exchange morat.[campaign].mailchimp.#Translate to Graphite format: [value] [timestamp]Inject into graphite exchange with routing keybased on sample window: 10sec.[campaign].mailchimp.[action].count
  • But let’s make it cool...
  • Complex Event Processing
  • mailchimp-graphite- valve.rb %w{ subscribe unsubscribe campaign }.each do |action| [ 10 sec, 1 min, 5 min, 15 min ].each do |window| valve.register "SELECT count(*) fromMailchimpEvent(type=#{action}).win:time_batch(#{window})", ( Listener.new(valve) do |agent, event| valve.output :graphite, "#{event.get(count(*))}", :routing_key =>window.delete( ) + ".morat.#{valve.application}.mailchimp.#{action}" end ) endend
  • Why use CEP?# find the sum of retweets of last 5 tweets which saw more than 10 retweetsSELECT sum(retweets) from TweetEvent(retweets >= 10).win:length(5)# find max, min and average number of retweets for a sliding 60 second window of timeSELECT max(retweets), min(retweets), avg(retweets) FROM TweetEvent.win:time(60 sec)# compute number of retweets for all tweets in 10 second batchesSELECT sum(retweets) from TweetEvent.win:time_batch(10 sec)# number of retweets, grouped by timezone, buffered in 10 second incrementsSELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone# compute the sum of retweets in sliding 60 second window, and emit count every 30 eventsSELECT sum(retweets) from TweetEvent.win:time(60 sec) output snapshot every 30 events# every 10 seconds, report timezones which accumulated more than 10 retweetsSELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone havingsum(retweets) > 10 Courtesy @igrigorik http://www.igvita.com/2011/05/27/streamsql-event-processing-with-esper/
  • Is there really a correlation?
  • Statistical Computing
  • ValveGrab raw data for window from graphite via RESTCreate scatter graph using R and calculatecorrelationInject correlation into graphite exchange
  • twitter-correlation-valve.rb require rsruby...r.jpeg(filename)r.assign(xs, data[1])r.assign(ys, data[2])fit = r.lm(ys ~ xs)r.plot({ x => data[1], y => data[2], xlab => label[1], ylab => label[2]})cor = r.cor(data[1],data[2]).to_sr.title("Correlation: " + cor)r.abline(fit[coefficients][(Intercept)],fit[coefficients][xs])r.eval_R("dev.off()")
  • Lets add some realtime visualisation...
  • Websockets
  • ValveSubscribe to twitter exchangemorat.twitter.search.[keyword]Extract adjectives using entaggerInject adjectives into twitter exchange with routingkey morat.twitter.search.[keyword].adjectives as:[adjective] [count]
  • twitter-sentiment-valve.rb require engtagger...log.debug "Received tweet from #{record[user][screen_name]} on#{routing_key}"adjectives = @parser.add_tags(record[text]).scan(EngTagger::ADJ).map do |n| @parser.strip_tags(n)endret = Hash.new(0)adjectives.each do |n| n = @parser.stem(n) ret[n] += 1 unless n =~ /As*z/end
  • SinkSubscribe to twitter exchangemorat.twitter.search.[keyword].adjectivesUse node.js and Socket.IO to send data to webclient via WebsocketsVisualise with processing.js in web browser
  • twitter-sentiment-sink.js io.sockets.on(connection, function (socket) { amqp_connection.on(ready, function () { var queue = amqp_connection.queue(); exchange = amqp_connection.exchange(twitter, { type: topic,passive: false, durable: true, autoDelete: true}, function (exchange) { queue.bind(exchange,routing_key); queue.subscribe(function (message) { socket.emit(data, { text: message.data.toString() }); }); }); });});
  • twitter-sentiment-sink.html <H1>Twitter Sentiment</H1> <div id="container"> <canvas id="twitter-sentiment-sink" data-processing-sources="twitter-sentiment-sink.pde" WIDTH=800 HEIGHT=600></canvas> </div> <script src="/socket.io/socket.io.js"></script> <script type="text/javascript"> var socket = io.connect(http://localhost); socket.on(data, function (data) { var pjs = Processing.getInstanceById(twitter-sentiment-sink); pjs.addDatum(data.text.split( )[0]); }); </script>
  • @ennui2342www.morat.co.uk polis.ecafe.org