Esperwhispering
Upcoming SlideShare
Loading in...5
×
 

Esperwhispering

on

  • 14,548 views

Using Esper to extract information from real-time data streams.

Using Esper to extract information from real-time data streams.

Statistics

Views

Total Views
14,548
Views on SlideShare
8,720
Embed Views
5,828

Actions

Likes
23
Downloads
301
Comments
1

27 Embeds 5,828

http://lethargy.org 5489
http://www.memonic.com 188
http://lanyrd.com 50
https://www.memonic.com 25
http://www.redditmedia.com 17
https://twitter.com 10
http://www.techgig.com 9
http://static.slidesharecdn.com 7
http://theoldreader.com 7
http://meineartikel.labs.nzz.ch 3
http://lanyrd.dev 3
http://twitter.com 3
http://coauthorship.rssing.com 2
http://a0.twimg.com 2
http://webcache.googleusercontent.com 1
http://www.linkedin.com 1
http://www.newsblur.com 1
http://boolies2.rssing.com 1
http://127.0.0.1 1
http://stg.lanyrd.org 1
http://translate.googleusercontent.com 1
http://l8.fm 1
http://us-w1.rockmelt.com 1
http://paper.li 1
http://www.netvibes.com 1
http://xss.yandex.net 1
http://192.168.129.73 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Esperwhispering Esperwhispering Presentation Transcript

  • Esperwhispering:Using Esper to Find Problems in Real-time Data / Real-time and real(ly) big 1
  • Who am I? @postwait on twitter Author of “Scalable Internet Architectures” Pearson, ISBN: 067232699X Contributor to “Web Operations” O’Reilly, ISBN: 1449377440 Founder of OmniTI, Message Systems, Fontdeck, & Circonus I like to tackle problems that are “always on” and “always growing.” I am an Engineer A practitioner of academic computing. IEEE member and Senior ACM member. On the Editorial Board of ACM’s Queue magazine. On the ACM professions board. 2
  • What is BigData? • Few agree. • I say it is any data-related problem that can’t be solved (well) on one machine. • Never use a distributed system to solve a problem that can be easily solved on a single system: • performance • simplicity • debugability3
  • Framing the data problem • events... to make it web related, lets say it is web activity • for every user action, we have an event • an event is composed of about 20-30 known attributes (say ~400 bytes) • url, referrer, site category, • ip address, ASN, geo location info, • user-perceived performance info (like load time)4
  • Framing the volume problem • We see 100 of these per second on a site • Easy problem (more or less) • We run SaaS, so we need to support 2000 customers: • 200,000 events/second (or 30x = 6,000,000 column appends/second)5
  • What do we want? • I want answers, dammit • I would like to know what is slow (or fast) by • ASN • geo location • browser type • I’d also like to know given an event: • is it outside the average +/- 2 x σ • over the last 5 minutes6
  • What else do we want? • I want answers now, dammit7
  • What else do we want? • I want answers now, dammit defined: not later7
  • What is real-time? • The correctness of the answer depends on both the logical correctness of the result and temporal proximity of the result and the question. • hard real-time: old answers are worthless. • soft real-time: old answers are worth less.8
  • Real-time on the Internet • Hard real-time systems on the Internet; this sort of thing ain’t my bag, baby! • Someone is just going to get hurt.9
  • Soft real-time? • We need soft real-time systems any time we are going to react to a user. • If the answer is either wrong or late, it is less relevant to them. • The problems we look at have temporal constraints ranging from 5 seconds (counters and statistics) to 1 second (fraud detection) to 10 milliseconds (user-action reaction) and everywhere in between.10
  • Enter CEP • Complex Event Processing... • Queries always running. • Tuples introduced. • Tuples emitted. • ’s Esper is my hero.11
  • Typical (OmniTI) Esper deployment: custom Java glue Application Infrastructure Cloud12
  • More concretely • node.js listens for web requests and submits data to Esper via AMQP • Esper runs “magic” • The output of that magic is pushed back via AMQP • node.js listens and returns data back over JSONP.13
  • What our event really looks like:{ _ls_part: { type: String }, url_schema: { type: String }, url_host: { type: String }, url: { type: String }, referrer_schema: { type: String }, referrer_host: { type: String }, referrer_path: { type: String }, ip: { type: String }, method : { type: String }, http_version : { type: String }, asn: { type: Integer }, browser: { type: String }, asn_orgname: { type: String }, browser_version: { type: String }, map_id: { type: String }, geoip_longitude: { type: Double }, red_time: { type: Double }, geoip_latitude: { type: Double }, dns_time: { type: Double }, geoip_country_code: { type: String }, con_time: { type: Double }, geoip_continent_code: { type: String }, req_start: { type: Double }, geoip_region: { type: String }, res_start: { type: Double }, geoip_metro_code: { type: Integer }, res_end: { type: Double }, geoip_country: { type: String }, dom_time: { type: Double, }, geoip_city: { type: String }, load_time: { type: Double, }, geoip_area_code: { type: Integer } }14
  • What our event really looks like:{ _ls_part: { type: String }, Client Token url_schema: { type: String }, url_host: { type: String }, url: { type: String }, referrer_schema: { type: String }, referrer_host: { type: String }, referrer_path: { type: String }, ip: { type: String }, method : { type: String }, http_version : { type: String }, asn: { type: Integer }, browser: { type: String }, asn_orgname: { type: String }, browser_version: { type: String }, map_id: { type: String }, geoip_longitude: { type: Double }, red_time: { type: Double }, geoip_latitude: { type: Double }, dns_time: { type: Double }, geoip_country_code: { type: String }, con_time: { type: Double }, geoip_continent_code: { type: String }, req_start: { type: Double }, geoip_region: { type: String }, res_start: { type: Double }, geoip_metro_code: { type: Integer }, res_end: { type: Double }, geoip_country: { type: String }, dom_time: { type: Double, }, geoip_city: { type: String }, load_time: { type: Double, }, geoip_area_code: { type: Integer } }14
  • What our event really looks like:{ _ls_part: { type: String }, Client Token url_schema: { type: String }, url_host: { type: String }, url: { type: String }, HTTP Info referrer_schema: { type: String }, referrer_host: { type: String }, referrer_path: { type: String }, ip: { type: String }, method : { type: String }, http_version : { type: String }, asn: { type: Integer }, browser: { type: String }, asn_orgname: { type: String }, browser_version: { type: String }, map_id: { type: String }, geoip_longitude: { type: Double }, red_time: { type: Double }, geoip_latitude: { type: Double }, dns_time: { type: Double }, geoip_country_code: { type: String }, con_time: { type: Double }, geoip_continent_code: { type: String }, req_start: { type: Double }, geoip_region: { type: String }, res_start: { type: Double }, geoip_metro_code: { type: Integer }, res_end: { type: Double }, geoip_country: { type: String }, dom_time: { type: Double, }, geoip_city: { type: String }, load_time: { type: Double, }, geoip_area_code: { type: Integer } }14
  • What our event really looks like:{ _ls_part: { type: String }, Client Token url_schema: { type: String }, url_host: { type: String }, url: { type: String }, HTTP Info referrer_schema: { type: String }, referrer_host: { type: String }, referrer_path: { type: String }, ip: { type: String }, method : { type: String }, http_version : { type: String }, asn: { type: Integer }, browser: { type: String }, asn_orgname: { type: String }, browser_version: { type: String }, map_id: { type: String }, geoip_longitude: { type: Double }, red_time: { type: Double }, geoip_latitude: { type: Double }, dns_time: { type: Double }, geoip_country_code: { type: String }, con_time: { type: Double }, geoip_continent_code: { type: String }, req_start: { type: Double }, geoip_region: { type: String }, res_start: { type: Double }, geoip_metro_code: { type: Integer }, res_end: { type: Double }, geoip_country: { type: String }, dom_time: { type: Double, }, geoip_city: { type: String }, load_time: { type: Double, }, geoip_area_code: { type: Integer } } User Perceived Performance Data14
  • What our event really looks like:{ _ls_part: { type: String }, Client Token url_schema: { type: String }, url_host: { type: String }, url: { type: String }, HTTP Info referrer_schema: { type: String }, referrer_host: { type: String }, referrer_path: { type: String }, User Location ip: { type: String }, method : { type: String }, http_version : { type: String }, asn: { type: Integer }, browser: { type: String }, asn_orgname: { type: String }, browser_version: { type: String }, map_id: { type: String }, geoip_longitude: { type: Double }, red_time: { type: Double }, geoip_latitude: { type: Double }, dns_time: { type: Double }, geoip_country_code: { type: String }, con_time: { type: Double }, geoip_continent_code: { type: String }, req_start: { type: Double }, geoip_region: { type: String }, res_start: { type: Double }, geoip_metro_code: { type: Integer }, res_end: { type: Double }, geoip_country: { type: String }, dom_time: { type: Double, }, geoip_city: { type: String }, load_time: { type: Double, }, geoip_area_code: { type: Integer } } User Perceived Performance Data14
  • First steps for simplicity • I want to create a view on 30 minutes of data for a specific client and populate that view with those “hit” events: create window fl9875309_hit30m.win:time(30 minute) as hit insert into fl9875309_hit30m select * from hit(_ls_part=fl9875309) • Some useful thoughts: • data flowing into this window: “istream” • data also flowing out of this window (after 30 minutes): “rstream” • if you are interested in both streams, we call it: “irstream”15
  • Asking a question: • EPL, as you can see looks much like SQL... so select count(*) from fl9875309_hit30m • SQLers will be very surprised by the result of this... • ideas? • Hint: this query runs forever and emits results as available • Esper defaults to use the istream of events form which it selects • So: • this statement emits a result on each event entering the window • and the return set is the total number of events within the window • We really wanted: select irstream count(*) from fl9875309_hit30m16
  • Asking a (cooler) question: • I’d like to know the view volume by referring site.. so select irstream referrer_host, count(*) as views from fl9875309_hit30m where referrer_host <> url_host group by referrer_host • This outputs on any event entering or leaving the window... but, • it only outputs the group that is being updated by the event(s) entering and/or leaving the window... • (perhaps) not so useful17
  • Snapshots • Sometimes you want to see the complete state. • Given that we’re asynch, we can decouple the output from the input. • Let’s get the top 10 referrers, every 5 seconds. select irstream referrer_host, count(*) as views from fl9875309_hit30m where referrer_host <> url_host group by referrer_host output snapshot every 5 seconds order by count(*) desc limit 1018
  • Finding anomalies... • Note: this is very very simplistic.19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS)19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS) • I’d like to find individual hits whose load_time is greater than the average + 3 times the standard deviation19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS) • I’d like to find individual hits whose load_time is greater than the average + 3 times the standard deviation • I’d like details about the hit’s IP, browser and load_time19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS) • I’d like to find individual hits whose load_time is greater than the average + 3 times the standard deviation • I’d like details about the hit’s IP, browser and load_time select asn_orgname, browser_version, ip, load_time, average, stddev, datapoints as sample_size from fl9875309_hit30m(load_time is not null) .std:groupwin(asn_orgname) .stat:uni(load_time, ip, browser_version, load_time) as s where s.load_time > s.average + 3 * s.stddev19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS) • I’d like to find individual hits whose load_time is greater than the average + 3 times the standard deviation • I’d like details about the hit’s IP, browser and load_time select asn_orgname, browser_version, ip, load_time, average, stddev, datapoints as sample_size from fl9875309_hit30m(load_time is not null) .std:groupwin(asn_orgname) .stat:uni(load_time, ip, browser_version, load_time) as s where s.load_time > s.average + 3 * s.stddev19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS) • I’d like to find individual hits whose load_time is greater than the average + 3 times the standard deviation • I’d like details about the hit’s IP, browser and load_time select asn_orgname, browser_version, ip, load_time, average, stddev, datapoints as sample_size from fl9875309_hit30m(load_time is not null) .std:groupwin(asn_orgname) .stat:uni(load_time, ip, browser_version, load_time) as s where s.load_time > s.average + 3 * s.stddev19
  • Finding anomalies... • Note: this is very very simplistic. • I’d like to break the dataset out by network (AS) • I’d like to find individual hits whose load_time is greater than the average + 3 times the standard deviation • I’d like details about the hit’s IP, browser and load_time select asn_orgname, browser_version, ip, load_time, average, stddev, datapoints as sample_size from fl9875309_hit30m(load_time is not null) .std:groupwin(asn_orgname) .stat:uni(load_time, ip, browser_version, load_time) as s where s.load_time > s.average + 3 * s.stddev19
  • Mapping it all out. • Looking at performance: a world’s-eye view
  • What’s this all mean? • Big data is all relative. • 100 records/s at 400 bytes each is... ~3GB/day or ~1TB/year • 100,000 records/s is... ~3TB/day or 1PB/year • 500,000 records/s is... ~15TB/day or 5PB/year • Which is big data? you choose. • The technology that can act on this in real-time exists and is different that the technologies to store it and crunch it. • Don’t think big... think efficient.
  • Thank You • Thanks you • Thank you • Thanks you • Consider attending: Surge 2011 discussing scalability matters, because scalability matters • Thank you!