SplunkLive! New York Dec 2012 - SNAP Interactive


Published on

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • snap is located down the street, our flagship product is areyouinterested.com one of the largest social discovery sites on the web with more than 5 M MAU. publicly traded company, most revenue from subs
  • lead the architecture team, we build the core infrastructure of the site, explore new technologies and work closely with ops. This includes delivering analytics platforms for produce and monitoring/debugging tools for developers.
  • we use splunk for a lot of stuff; today i only have time to highlight a few of the more interesting things. specifically i’ll explain what we give splunk, and how; the monitoring capabilities that it gives us and most interesting the analytics we can now perform.
  • so what do we index? most importantly our custom application log which contains structured... data ;we also profile parts of our application and log that. and of course we index our error_logs
  • when writing logs from our application we use a centralized logging class that gives us some simple but highly beneficial functionality.
  • anytime the structured data contains a uid, we integrate key information about that user. considered lookup tables but this is much more performant.
  • this information gives us tremendous power. we can...
  • as i mentioned, we also log performance metrics, we simply wrap parts of our code with timers, and log the time spent working
  • so that’s pretty much what we give splunk, so what does it give us?
  • With regards to monitoring, we are a continuous deployment shop; deploying site changes 15-50 times a day.
  • our goal with monitoring is to know immediately if we break something. for performance we use realtime background searches that each power multiple dashboard views
  • we also have a frontend error monitoring dashboard
  • we also monitor email and performance very closely
  • in an attempt to maximize engagement we recently started scheduling all of our email sends. now in addition to monitoring sends we need to monitor the health of the schedule queue
  • when a server gets hot, one of the first things we try to do is correlate
  • make data driven product decisions. product team doing lots of analysis, summary indexing for performance. recently upgraded to 5.0 smooth 6min downtime; will continue to use summary indexing.
  • these mysterious unlabled lines are an example of how we can monitor open and click rates for our emails. targeting specific templates, countries, genders, ISPs, or outbound IP addresses.
  • this is one of the most interesting dashboards we have. This allows us to perform cohort analysis on our users, tracking a group that joined on a particular day and measure their lifetime behavior
  • SplunkLive! New York Dec 2012 - SNAP Interactive

    1. 1. High Velocity Intelligence Application Monitoring with Splunk SNAP Interactive, Inc. Presented by: Nicholas DiSanto Architecture Team Lead
    2. 2. Company Overview • SNAP Interactive, Inc. • www.AreYouInterested.com • Believes it is one of the largest social discovery platforms on the web (based on monthly active users) • More than 5 million monthly active users • Over 1 billion total pieces of structured data from its users • Synced to millions of Facebook profiles • Receives over 1,000 real-time updates per minute on like actions from Facebook www.snap-interactive.com
    3. 3. About Nick • Developing on LAMP stack at tech startups for 10 years • Leading a team of core engineers • Passionate about experimentation & data driven iteration • Striving to eliminate all technical blockers to speed and innovation • @NicholasDiSanto www.snap-interactive.com
    4. 4. Summary • We use splunk for many, many things! • Today, I will share some of our more interesting applications • How we get data into splunk • What we do with that data • Various types of monitoring • Extensive user behavior analysis www.snap-interactive.com
    5. 5. What We Give Splunk • Custom application logs • Structured, minified, event data • De-normalized user demographics • Application profiling data • Error logs www.snap-interactive.com
    6. 6. Sending Splunk Data • Centralize logging functions and: • Format arbitrary structured data into splunk extractable field/value pairs: field=”value” • Normalize and minify field names • Detect user_id and augment logs • Optionally log a percent of events • Target different log files (error, info, analytics) www.snap-interactive.com
    7. 7. User Demographics • Our analytics log contains application events, triggered by real users • We augment these event logs with useful demographic data to classify the events ✦ Gender ✦ Seeking gender ✦ Country ✦ Ethnicity ✦ Date of birth www.snap-interactive.com
    8. 8. Demographic Power • By augmenting event logs with user demographics we can perform powerful and detailed analysis of user behavior • Target analysis at countries, genders, or age ranges • Classify events by days since: registration, login, email open, etc. • ...and much more www.snap-interactive.com
    9. 9. Performance Metrics • We time key algorithms in our application, and log: • server name • query name • time spent working • This lets us graph the average, min and max times of these algorithms per server • We also dark launch features, benchmarking performance prior to official launch. www.snap-interactive.com
    10. 10. Performance • Average query time for key algorithms by server www.snap-interactive.com
    11. 11. What Splunk Gives Us • Monitoring - to measure application health • Analysis - to drive future product decisions • AB test evaluation - to validate hypotheses • Detection - to find patterns & classify users www.snap-interactive.com
    12. 12. Monitoring • With continuous deployment, detailed monitoring is absolutely essential. • Each deploy we watch changes in: • Realtime classified error graphs • Core event stat graphs • We also monitor email deliverability, revenue, and performance (although not every deploy) www.snap-interactive.com
    13. 13. Error & Event Monitoring • Alert us immediately after deploy if something has gone wrong • Use realtime background searches • Single dashboard with multiple graphs and tables • We are exploring realtime sms alerts to the ‘developer on call’ • Use historial data to identify min/max expected thresholds (weighted averages: same time of day, same day of week) • Detect consistent deviations and alert www.snap-interactive.com
    14. 14. Error Monitoring • Count of all errors : past 30 seconds www.snap-interactive.com
    15. 15. Error Monitoring • All errors: past 5 minutes w/deploys www.snap-interactive.com
    16. 16. Error Monitoring • Rolled up errors: past 5 minutes www.snap-interactive.com
    17. 17. Error Monitoring • Rolled up filtered errors: past 5 minutes www.snap-interactive.com
    18. 18. Error Monitoring • Rolled up JS errors: past 3 hours www.snap-interactive.com
    19. 19. Event Monitoring • We monitor ~20 event stats, in realtime, each deploy www.snap-interactive.com
    20. 20. Event Monitoring • Overview and detail views, powered by a single realtime background search www.snap-interactive.com
    21. 21. Monitor Email & Performance • Email • Deliverability is essential to business • Need to maximize engagement • Performance • What async jobs may be contributing to high DB load? • What performance are end users experiencing? • Are particular servers overloaded? www.snap-interactive.com
    22. 22. Email Deliverability • Overview of key metrics www.snap-interactive.com
    23. 23. Email Monitoring • Inserts into email scheduled send queue www.snap-interactive.com
    24. 24. Performance • Asynchronous process timers • Can correlate spikes with site issues www.snap-interactive.com
    25. 25. Analysis • We heavily leverage summary indexing for performance gains • Daily rollups are grouped judiciously, giving us fast, flexible, analysis over long periods • We summarize: revenue, email deliverability, core KPI, and general stats data • Custom dashboards facilitate easy searching • Lots of ad hoc searching by product team www.snap-interactive.com
    26. 26. Email Analysis • Sends opens, clicks, bounces & FBL rates by email type www.snap-interactive.com
    27. 27. Email Analysis • Monitor changes in open & click rates by email, ISP, country, etc. www.snap-interactive.com
    28. 28. Email Analysis • Analysis dashboard *Sample data www.snap-interactive.com
    29. 29. Core KPI Dashboard • Powerful targeted cohort analysis *Sample data www.snap-interactive.com
    30. 30. AB Test Results • We are constantly running a variety of AB experiments on our live users • We divide our user population in to nine 10% segments and ten 1% segments • Each segment can be targeted with an experiment • All event logs are annotated with the appropriate AB experiment name • This allows us to measure behavior changes between experiment and control groups www.snap-interactive.com
    31. 31. Easy AB Analysis • All event logs contain the an AB field • This identifies the experiment group of the user at that point in time • Fully integrated into core analysis dashboards • Ad hoc analysis becomes simple `my_search` (AB=my_test OR AB=ctrl) | `my_reporting_command` by AB www.snap-interactive.com
    32. 32. AB Dashboards • key metrics: experiment vs control group www.snap-interactive.com
    33. 33. Latest Splunking - Detection • The way our users interact with one another is insightful • We can use this data to classify users: • Identifying “attractive” users • Identifying spammers & scammers • We test hypotheses with ad hoc searches • Find reliable patterns then setup scheduled searches that interface with MySQL • This data then feeds into our application in various ways www.snap-interactive.com
    34. 34. Contact Us • SNAP Interactive, Inc. www.snap-interactive.com • Nicholas DiSanto Architecture Team Lead ndisanto@snap-interactive.com 301-BIG-TREE @NicholasDiSanto • Lindsay Bubbico www.snap-interactive.com