SplunkLive! New York Dec 2012 - SNAP Interactive

Like this? Share it with your network

Share

SplunkLive! New York Dec 2012 - SNAP Interactive

  • 1,180 views
Uploaded on

 

More in: Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,180
On Slideshare
1,180
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • snap is located down the street, our flagship product is areyouinterested.com one of the largest social discovery sites on the web with more than 5 M MAU. publicly traded company, most revenue from subs
  • lead the architecture team, we build the core infrastructure of the site, explore new technologies and work closely with ops. This includes delivering analytics platforms for produce and monitoring/debugging tools for developers.
  • we use splunk for a lot of stuff; today i only have time to highlight a few of the more interesting things. specifically i’ll explain what we give splunk, and how; the monitoring capabilities that it gives us and most interesting the analytics we can now perform.
  • so what do we index? most importantly our custom application log which contains structured... data ;we also profile parts of our application and log that. and of course we index our error_logs
  • when writing logs from our application we use a centralized logging class that gives us some simple but highly beneficial functionality.
  • anytime the structured data contains a uid, we integrate key information about that user. considered lookup tables but this is much more performant.
  • this information gives us tremendous power. we can...
  • as i mentioned, we also log performance metrics, we simply wrap parts of our code with timers, and log the time spent working
  • so that’s pretty much what we give splunk, so what does it give us?
  • With regards to monitoring, we are a continuous deployment shop; deploying site changes 15-50 times a day.
  • our goal with monitoring is to know immediately if we break something. for performance we use realtime background searches that each power multiple dashboard views
  • we also have a frontend error monitoring dashboard
  • we also monitor email and performance very closely
  • in an attempt to maximize engagement we recently started scheduling all of our email sends. now in addition to monitoring sends we need to monitor the health of the schedule queue
  • when a server gets hot, one of the first things we try to do is correlate
  • make data driven product decisions. product team doing lots of analysis, summary indexing for performance. recently upgraded to 5.0 smooth 6min downtime; will continue to use summary indexing.
  • these mysterious unlabled lines are an example of how we can monitor open and click rates for our emails. targeting specific templates, countries, genders, ISPs, or outbound IP addresses.
  • this is one of the most interesting dashboards we have. This allows us to perform cohort analysis on our users, tracking a group that joined on a particular day and measure their lifetime behavior

Transcript

  • 1. High Velocity Intelligence Application Monitoring with Splunk SNAP Interactive, Inc. Presented by: Nicholas DiSanto Architecture Team Lead
  • 2. Company Overview • SNAP Interactive, Inc. • www.AreYouInterested.com • Believes it is one of the largest social discovery platforms on the web (based on monthly active users) • More than 5 million monthly active users • Over 1 billion total pieces of structured data from its users • Synced to millions of Facebook profiles • Receives over 1,000 real-time updates per minute on like actions from Facebook www.snap-interactive.com
  • 3. About Nick • Developing on LAMP stack at tech startups for 10 years • Leading a team of core engineers • Passionate about experimentation & data driven iteration • Striving to eliminate all technical blockers to speed and innovation • @NicholasDiSanto www.snap-interactive.com
  • 4. Summary • We use splunk for many, many things! • Today, I will share some of our more interesting applications • How we get data into splunk • What we do with that data • Various types of monitoring • Extensive user behavior analysis www.snap-interactive.com
  • 5. What We Give Splunk • Custom application logs • Structured, minified, event data • De-normalized user demographics • Application profiling data • Error logs www.snap-interactive.com
  • 6. Sending Splunk Data • Centralize logging functions and: • Format arbitrary structured data into splunk extractable field/value pairs: field=”value” • Normalize and minify field names • Detect user_id and augment logs • Optionally log a percent of events • Target different log files (error, info, analytics) www.snap-interactive.com
  • 7. User Demographics • Our analytics log contains application events, triggered by real users • We augment these event logs with useful demographic data to classify the events ✦ Gender ✦ Seeking gender ✦ Country ✦ Ethnicity ✦ Date of birth www.snap-interactive.com
  • 8. Demographic Power • By augmenting event logs with user demographics we can perform powerful and detailed analysis of user behavior • Target analysis at countries, genders, or age ranges • Classify events by days since: registration, login, email open, etc. • ...and much more www.snap-interactive.com
  • 9. Performance Metrics • We time key algorithms in our application, and log: • server name • query name • time spent working • This lets us graph the average, min and max times of these algorithms per server • We also dark launch features, benchmarking performance prior to official launch. www.snap-interactive.com
  • 10. Performance • Average query time for key algorithms by server www.snap-interactive.com
  • 11. What Splunk Gives Us • Monitoring - to measure application health • Analysis - to drive future product decisions • AB test evaluation - to validate hypotheses • Detection - to find patterns & classify users www.snap-interactive.com
  • 12. Monitoring • With continuous deployment, detailed monitoring is absolutely essential. • Each deploy we watch changes in: • Realtime classified error graphs • Core event stat graphs • We also monitor email deliverability, revenue, and performance (although not every deploy) www.snap-interactive.com
  • 13. Error & Event Monitoring • Alert us immediately after deploy if something has gone wrong • Use realtime background searches • Single dashboard with multiple graphs and tables • We are exploring realtime sms alerts to the ‘developer on call’ • Use historial data to identify min/max expected thresholds (weighted averages: same time of day, same day of week) • Detect consistent deviations and alert www.snap-interactive.com
  • 14. Error Monitoring • Count of all errors : past 30 seconds www.snap-interactive.com
  • 15. Error Monitoring • All errors: past 5 minutes w/deploys www.snap-interactive.com
  • 16. Error Monitoring • Rolled up errors: past 5 minutes www.snap-interactive.com
  • 17. Error Monitoring • Rolled up filtered errors: past 5 minutes www.snap-interactive.com
  • 18. Error Monitoring • Rolled up JS errors: past 3 hours www.snap-interactive.com
  • 19. Event Monitoring • We monitor ~20 event stats, in realtime, each deploy www.snap-interactive.com
  • 20. Event Monitoring • Overview and detail views, powered by a single realtime background search www.snap-interactive.com
  • 21. Monitor Email & Performance • Email • Deliverability is essential to business • Need to maximize engagement • Performance • What async jobs may be contributing to high DB load? • What performance are end users experiencing? • Are particular servers overloaded? www.snap-interactive.com
  • 22. Email Deliverability • Overview of key metrics www.snap-interactive.com
  • 23. Email Monitoring • Inserts into email scheduled send queue www.snap-interactive.com
  • 24. Performance • Asynchronous process timers • Can correlate spikes with site issues www.snap-interactive.com
  • 25. Analysis • We heavily leverage summary indexing for performance gains • Daily rollups are grouped judiciously, giving us fast, flexible, analysis over long periods • We summarize: revenue, email deliverability, core KPI, and general stats data • Custom dashboards facilitate easy searching • Lots of ad hoc searching by product team www.snap-interactive.com
  • 26. Email Analysis • Sends opens, clicks, bounces & FBL rates by email type www.snap-interactive.com
  • 27. Email Analysis • Monitor changes in open & click rates by email, ISP, country, etc. www.snap-interactive.com
  • 28. Email Analysis • Analysis dashboard *Sample data www.snap-interactive.com
  • 29. Core KPI Dashboard • Powerful targeted cohort analysis *Sample data www.snap-interactive.com
  • 30. AB Test Results • We are constantly running a variety of AB experiments on our live users • We divide our user population in to nine 10% segments and ten 1% segments • Each segment can be targeted with an experiment • All event logs are annotated with the appropriate AB experiment name • This allows us to measure behavior changes between experiment and control groups www.snap-interactive.com
  • 31. Easy AB Analysis • All event logs contain the an AB field • This identifies the experiment group of the user at that point in time • Fully integrated into core analysis dashboards • Ad hoc analysis becomes simple `my_search` (AB=my_test OR AB=ctrl) | `my_reporting_command` by AB www.snap-interactive.com
  • 32. AB Dashboards • key metrics: experiment vs control group www.snap-interactive.com
  • 33. Latest Splunking - Detection • The way our users interact with one another is insightful • We can use this data to classify users: • Identifying “attractive” users • Identifying spammers & scammers • We test hypotheses with ad hoc searches • Find reliable patterns then setup scheduled searches that interface with MySQL • This data then feeds into our application in various ways www.snap-interactive.com
  • 34. Contact Us • SNAP Interactive, Inc. www.snap-interactive.com • Nicholas DiSanto Architecture Team Lead ndisanto@snap-interactive.com 301-BIG-TREE @NicholasDiSanto • Lindsay Bubbico www.snap-interactive.com