Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

  • 685 views
Uploaded on

On Saturday, 26 of October, the second external meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto ...

On Saturday, 26 of October, the second external meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto «Discover the value of Big Data!»

* Tech Hangout -- an event, organized by the developers for the developers for knowledge and experience sharing. The concept of the event proposes a 30-minute report on the topic previously defined, and the discussion of the same duration in a roundtable session format. This initiative has proved to be so popular and high-demand that Tech Hangout own logo, blog and group on Facebook with the opportunity to discuss information heard have been created in a short period of time.

Join to discuss - https://www.facebook.com/groups/techhangout/
Read us - http://hangout.innovecs.com/

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
685
On Slideshare
675
From Embeds
10
Number of Embeds
1

Actions

Shares
Downloads
8
Comments
0
Likes
1

Embeds 10

http://blog.innovecs.com 10

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • {}

Transcript

  • 1. Big Data in the advertising industry Michael Dewhirst Captify CTO; StrikeAd, DevZeroG cofounder
  • 2. Who am I? Born Moscow, Russia UK from 1991 Working in Kiev (from London) since 1999 In IT/Software (professionaly) since 1994 Ex Java, HTML/JS, ABAP/SAP, .NET (shhh..), Notes, etc developer Working with Big Data since 2010 Freediving and rockclimbing when not working
  • 3. Companies StrikeAd (2010-2013: CTO, Co-founder) Mobile advertising media DSP / trading platform Processing 10’s of BN requests/month Several “Big Data” solutions in place Launched in 2010 (co founded) Captify (2013-now: CTO) Search re-targeting company Processing 10’s of BN requests/month Complex “dual” traffic and data workflow Launched R&D dpt 2 months ago
  • 4. Why is Big Data so key? Pretty much everything in a business revolves around data and understanding it and there is exponentially more data every day to understand
  • 5. What is Big Data What is big data and what solutions can be classed as such?
  • 6. What is Big Data “Internet scale” / Billions of transactions a month 2000-5000+ QPS (queries per second)
  • 7. What is Big Data Processing time of under a second per transaction Usually sub100ms
  • 8. What is Big Data Ability to aggregate, report and analyse processed data in near real time or real-time
  • 9. What data? Ad slots Impressions User ID IP address Clicks GPS lat long Actions/conversi ons Site URL Tracking pixels Data feeds / databases Site category Age Gender Income Connection type (mobile / wifi) etc
  • 10. The Challenge
  • 11. (s) The Challenge A lot of volume which quickly needs retrospective access
  • 12. Architecture, Design, Solutions
  • 13. Typical architecture Modules/components: 1. Load Balancing 2. Actual processing distributed identical workers 1. Logging 2. ETL (Extract Transform Load) Processing logs, summarising/aggregat ing by keys 1. Aggregated data 2. “Big DataBase” (sometimes x2)
  • 14. Big Data specific features Load balancing By geo - routing requests to nearest data centre By load - usually round robin evenly distributing traffic between available nodes DNS or software based (or both)
  • 15. Big Data specific features Storage RW/RO In-mem only for real time data (sub 100ms access) On disk for near-line, non-”realtime” access
  • 16. Big Data specific features Storage - in-mem (fast) - Sharding Splitting data across several nodes (e.g. “A-C” node1; “D-F” - node2, etc) - whole DB does not fit in one server memory Hashing request data to determine storage node 2 tier architecture: 1) Load balancing tier evenly distributing traffic between available nodes - each LB is identical 2) Data storage tier, only processing relevant requests, each node only stores it’s chunk/shard
  • 17. Sharding architecture
  • 18. Dynamic scaling Cloud based hosting charges are usually time based Local continental data centres are needed Traffic usually fluctuates significantly during the day, week, month and year Cloud based hosting allows quick server/instance commissioning / decommissioning Instances can be added as traffic trends grow
  • 19. Other node updating (there can be 100’s areas Automatic to manage) Monitoring and alerting (load, space, errors, etc) Burn in - testing new code on a small cluster before upgrading whole network Good security - firewalls, local user/file access, etc Avoid having single points of failure Old log near-line storage (e.g. Amazon
  • 20. Architecture, design, solutions Any other “modules”?
  • 21. Machine learning
  • 22. What is machine learning? Automated, algorithmic statistical data analysis and pattern detection
  • 23. What?!
  • 24. Used in advertising? To help find repeatable actions with lowered risk and high expected outcome certainty
  • 25. Meaning... Finding links between ad properties to buy more clicks or actions, e.g. ad shown on site a, during lunch time, ad size 320x600, user from London, etc CPC likelihood of 10% user with iPhone, in Central Kiev, having been to dance club sites - 30% likelyhood of conversion to taxi advertising
  • 26. Vendors and solutions
  • 27. Vendors and solutions Apache Hadoop Google Big Query Nginx Dynamo Erlang, OTP, etc PostgreSQL Aerospike MongoDB Amazon Memcache Xtremedata
  • 28. Vendors and solutions DynDns Nustar DNS Nustar Quova Geo DB Amazon Route53 Amazon Load Balancing
  • 29. Real world examples • Companies who have big data at their core Google AdX / Double click Online and mobile Advertising Exchange Ad serving Criteo
  • 30. Conclusions A complex, specialised industry and software development sub-category Technically challenging by an order of magnitude NOT only for “special” people - anybody can get in - I did Genuinely interesting to work in
  • 31. Questions?
  • 32. The end Thank you!