• Like
  • Save

SplunkLive! San Francisco Dec 2012 - Socialize

  • 428 views
Uploaded on

 

More in: Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
428
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Using Splunk ToEvaluate 20 Billion Ad Impressions Monthly Isaac Mosquera, CTO @imosquera • isaac.mosquera@getsocialize.com
  • 2. A Little Bit About Real-Time Bidding Ad Request Bid Request R Winning Bidders Ad T Bid Response B Socialize Bidder Ad Impression Ad ClickAll this needs to happen in less than 100 milliseconds!
  • 3. So what are some of our problems? Operational ● Evaluating more than 10,000 bid requests per second ● Which bids are > 100ms ● Quickly finding any errors within the system ● Problems tracking clicks and impressions means loss of revenue. Decision Making & Bid Algorithms ● Merging RTB data with our Social data ● Campaign spending ● Campaign efficiency ● Dissect data by: ○ apps ○ users ○ devices
  • 4. Analyzing Big Data Efficiently1. Collection2. Storage3. Analyzation/Aggregation4. Retrieval
  • 5. Some Options● RDBMS: SQL functions like count() creates presents problems at scale● RDBMS: Write operations too high for a single DB, as well as a single point of failure.● NoSQL: Would work well for high inserts and queries, however we would lose the simple alerting, charting and reporting dashboards.● Hadoop: simple querying using Hive, however its a new environment to manage... and again lose alerting, charting and reporting.
  • 6. Splunk Fits the Bill● Operational Reporting: Easily identify problems and prevent erroneous spending. When an alert goes off we hit a script which shuts off the bidder.● AdHoc Queries: Allows us to find patterns in the data to improve our bid algorithms● Application Reporting: Instantly know campaign metrics for us and our clients. "This has got to be the most thorough mobile campaign report Ive ever received, so major props to all of you." - Hipmunk Marketing● Scalability: Adding new RTB Service providers means billions of new ad requests. Scaling horizontally is key.
  • 7. Data Collection ● Although Splunk works great with unstructured data, we need some structure to make querying easy. ● Created a small client to push events to Splunk indexers: ● Very Simple, accepts only 2 fields: event name, Metadata (dictionary) ● Events are application data like bid requests, clicks, impressions, and application installs
  • 8. What do our logs look like?
  • 9. Storage● Performance and redundancy using new Provisioned IOPS for high I/O● Nightly snapshots to S3 Socialize Bidder● Logs are gzipped by Splunk before being snapshotted for Splunk Indexer Splunk Indexer 70% compression gains. EBS EBS● Continuously indexed by Splunk so reports can even be done in real-time S3 Backups
  • 10. Using Splunk to Analyze Operational Data Allows you to write MapReduce jobs with SQL style querying language: source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev (ResponseTime) as stdev_rtime Easily digest information through charts
  • 11. Analyzation/Aggregationindex=ad_events displayed_ad| spath| bin _time span=1m| stats count(displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time| mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price" Splunk Indexer Search Indexer RDBMS Head (Generated Reports) Indexer
  • 12. Retrieval● MySQL and Memcache allows for super fast retrieval of aggregated reports● Use aggregated information to make smarter bids Socialize Bidder Cache Cluster Memcache Memcache Memcache RDBMS
  • 13. Final Architecture Socialize Bidder Splunk Cache Cluster Indexer Memcache Memcache Memcache Indexer Indexer Search RDBMS Head (Generated Reports) S3Snapshots
  • 14. Thank you!isaac.mosquera@getsocialize.com | @imosquera