Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tweaking Open Source


Published on

Published in: Technology
  • Login to see the comments

Tweaking Open Source

  1. 1. Tweaking Open-Source a case-study Nelson Gomes ( Team Leader 11th of November 2011
  2. 2. Talk IndexIntroductionOpenX ComponentsImprovements IntroducedOverall ArchitectureReport ServerProblems FoundLinksQ&A
  3. 3. Introduction – Online AdvertisementOnline advertisement coupes with delivering ads;Placing ads in sites is a complex process: Obtain all ads electable for a placeholder; Exclude ads with business limitations like capping; Assure that the ads are beying presented to the target audience; Assure the advertiser goals are being met; Account the ads delivered; Deliver the right ad format;
  4. 4. Introduction – Online AdvertisementExamples
  5. 5. Introduction – Online AdvertisementExamples
  6. 6. Introduction – Online AdvertisementExamples
  7. 7. Introduction – Online AdvertisementExamples
  8. 8. Introduction – OpenXOpen Source advertising server;Licensed under GNU General Public License;Project forked from phpAds developed by Tobias Ratschiller in 1998;Was called phpAdsNew, OpenAds and finally OpenX;Features: Has a web based GUI; Extendable plugins architecture; Serves ads throught JS and Iframes calls mainly;
  9. 9. Introduction – OpenXSupport technologies: PHP; MySQL; Web Server (Apache, Nginx); Optional memcached usage; Filesystem to serve ad content;
  10. 10. Introduction – OpenDisplayStarting from OpenX 1.8.5 version, SAPO OpenDisplay project began;A four-person team started in April 2010 to analyse and improve OpenX capabilities to ensure entire SAPOs ad serving network;In August 2010, OpenDisplay started to serve a major website, while development was undergoing;In February 2011, SAPO began migrating its ad serving network in a process that took about 3 months to complete;Today OpenDisplay serves the entire SAPOs ad network;
  11. 11. Introduction – OpenDisplayOpenDisplay serves ads for several media: Internet; Mobile Internet; Mobile Applications; TV set-top boxes; Connected TVs;In the near future well be serving bulk campaigns for other media;Ill try to tell in this presentation this endeavour steps and quirks;
  12. 12. OpenDisplay Components – FrontendThis component is responsible to serve all ad formats;No data processing is done here due to performance besides adserving itself;The adserving is done using munged PHP scripts for performance;Plugins are included in a on demand basis;Database queries are cached;So its all about ad serving decision making;
  13. 13. OpenDisplay Components – BackendComprises data feature processing;Web based GUI for campaign and ad management;Ad serving statistics;Reporting;Batch processing of ad delivery data for use by the frontends;
  14. 14. OpenDisplay Components – TasksMaintenance Priority Engine (MPE) Determines witch campaigns to serve given their priorities; Calculates ad serving probabilites given its probabilities and corrects them when underperforming or overperforming;Maintenance Statistics Engine (MSE) Processes ad serving numbers; Starts and stops campaigns;
  15. 15. Improvements Introduced - GeneralAdded reusable segmentation rules; This way a rule can be reused in several campaigns; Added compound segmentation rules; Segmentation rules engine was rewritten, cause the previous segmentation system was inadequate;Added the concept of Orders; Sometimes a customer has several goals to different sites; The concept of order allows to place several campaigns with different goals in a single customer order;
  16. 16. Improvements Introduced - GeneralAdded Zone Groups; Instead of selecting placeholders one a at a time we can associate several at once; Imagine that a Run of Network (RON) campaign for all MREC (300x250) placeholders would need to be associated to all placeholders one by one;Added revenue-share acounting; For ads served on pages with third-party content; This way, revenue can be shared with third-party content providers;
  17. 17. Improvements Introduced - GeneralOpenDisplay went through a security audit by SAPOs security team and several issues were solved;Backoffice: UI session cookies are now only delivered over SSL; Session id generation function wasnt good enought and could be easily guessed. This correction minimized session hijacking; New user profiles were added, and entity access was reviewed; Some user profiles were changed to read-only, like advertisers and sites;
  18. 18. Improvements Introduced - GeneralAds uploaded into the ad server are stored in a folder and served upon; At first look there is no problem with this, but over time in some systems this can cause inode exhaustion; So to prevent this, and speed up file retrieval we improved upload component to distribute the files in a two-level folder hierarchy;OpenX can use a content farm to deliver ads, so we use this feature from the start;
  19. 19. Improvements Introduced - GeneralTraffic forecast: OpenX doesnt have a traffic forecast engine, instead it uses an average of ads served; We developed two alternative forecast algorithms using Python; This forecast is critical for a couple of reasons: Inventory selling; Correct impression allocation for campaigns, specially due to targetting rules;
  20. 20. Improvements Introduced - GeneralTraffic forecast example:
  21. 21. Improvements Introduced - GeneralAdded data logging and analysis: We started to summarize delivery properties to allow us to calculate precise segmentation delivery probabilies; Using these numbers in combination with traffic forecast we can estimate the inventory for each campaign and its overall probability of delivery; Also, this information is useful to commercial purposes: Knowing the market is a very valuable information; We are currently migrating some of this data to Hbase that reduces data, making it usable;
  22. 22. Improvements Introduced - GeneralRestructured VAST 1.0 system and upgraded it to 2.0; Video Ad Serving Template (VAST) standard from Interactive Advertising Bureau; Delivers video ads (pre, mid and postrolls); Delivers overlays;We also added a new type of ad that allows us to serve SAPO text ads has images; This virtual ad type works has a proxy to a different ad system, combining two different ad systems; Probably the first time an ad system combined them;
  23. 23. Improvements Introduced - General<?xml version="1.0" encoding="UTF­8" standalone="no"?><VAST version="2.0" (...)> <Ad id="30324">  <InLine>   <AdSystem>OpenDisplay</AdSystem>   <AdTitle><![CDATA[Teste Vast Video]]></AdTitle>   <Description><![CDATA[VAST Ad]]></Description>   <Impression id="OpenDisplay"><![CDATA[]]></Impression>   <Creatives>    <Creative id="30324">     <Linear>      <TrackingEvents>       <Tracking event="creativeView"><![CDATA[]]></Tracking>(...)      </TrackingEvents>      <MediaFiles><MediaFile (...) type="video/x­flv">http://(...)/video.flv</MediaFile></MediaFiles>     </Linear>    </Creative>   </Creatives>  </InLine> </Ad></VAST>
  24. 24. Improvements Introduced - GeneralFlash ads are a major problem in some systems that dont support Flash; iPhones and iPads for example;To assure these ads are at all times visible we added automatic Flash ad image generation to ads upload via Backend;This way, even if a Flash ad doesnt have a fallback image, we generate one automatically; This was accomplished using GNUs gnash in combination with xvfb-run that provides a virtual X Window System for gnash to run;
  25. 25. Improvements Introduced - GeneralFuture developments will include bulk campaigns; These campaigns differ from regular campaigns cause we know the characteristics of the audience in advance; Splitting audiance in sets with the same features we can process an entire set within the LP solver at once minimizing the number of variables;So we can optimize the revenue using linear programming solutions; We will use GLPK (GNU Linear Programming Kit) has a solver to obtain an optimal solution; This way we can provide a solution that maximizes a campaigns revenue;
  26. 26. Improvements Introduced - GeneralGLPK sample problem:# Giapettos problem, maximizing Giapettos profitvar x1 >=0;  /* soldier worths 3€  */var x2 >=0;  /* train worths 2€  *//* Objective function */maximize z: 3*x1 + 2*x2; // maximize Giapettos profit/* Constraints */s.t. Finishing : 2*x1 + x2 <= 100; // only 100 hours per weeks.t. Carpentry : x1 + x2 <= 80; // only 80 hours per weeks.t. Demand    : x1 <= 40; // demand of soldiers per weekEnd;
  27. 27. Improvements Introduced - FrontendDatabase write operations were removed. Database access now is read-only;Delivery scripts were analysed using xdebug, and major performance issues were tuned: User agent regexps used by PHPSniff were taking 25% of the entire request time. Using memchache as user agent cache we saved 97% of this time! All ad serving counters are done in memcache and persisted at every minute, soon well migrate this to broker queues; Improved ad caching system, to store and retrieve EVERYTHING in a single operation;
  28. 28. Improvements Introduced - FrontendUsing xdebug output has an input to KCachegrind it is very easy to analyse any PHP script: just run it!Files generated by xdebug are read and analysed by KCachegrind that shows for instance: How many times a function has been called; Total time each function used; Where request time is use;Making very easy to detect and improve any long running script;
  29. 29. Improvements Introduced - FrontendKCachegrind printscreen
  30. 30. Improvements Introduced - FrontendInstead of using an Apache web server we decided to use Nginx with PHP-FPM: Nginx scales almost linearly; PHP-FPM behaved very fast in our tests;PHP-FPM is a FastCGI implementation, now blunded with PHP 5.3.3;Instead of using PHP output compression, we used Nginx compression, witch is faster;Of course, we used a PHP accelerator: eAccelerator with shared memory witch is adequate to PHP-FPM multi- process architecture;
  31. 31. Improvements Introduced - FrontendEven adding new features, we still were able to reduce delivery times:
  32. 32. Improvements Introduced - FrontendIntroduced a cookie abstraction API to allow storing all cookie and session information server-side: OpenX by default stored session information in cookies what was insufficient to keep an entire ad network running due to cookie size limit (~4k); This was a critical issue for long serving campaigns that used capping or conversion data; Less cookies means less bandwidth usage and faster responses;
  33. 33. Improvements Introduced - FrontendThe new session storage mechanism added new issues; The requests had to be sequential to allow correct session retrieval and storage; This required a lock mechanism to obtain session info in an ordered fashion; This was accomplished using memcache atomic increments to lock session access; All sessions are stored in memcache and the complete process of locking, retrieving, storing and unlocking of the session is done in a few ms (<3ms), from remote servers!;
  34. 34. Improvements Introduced - FrontendWe can see in this chart outbound traffic dropped significantly:
  35. 35. Improvements Introduced - FrontendWe introduced zone capping, a feature that wasnt available in OpenX; This feature is very useful with video ads, to avoid user flooding with video ads; Using zone capping we can say that a user will see one or more ads and then will not see any more ads during a given period of time; This feature is managed by placeholder, independently of the campaign settings;
  36. 36. Improvements Introduced - FrontendAdded new delivery endpoints to accomodate new formats: Mobile: Json Xml iPhonePlist TV VASTAlso we developed a SDK to help mobile ads integration: Mobile ads are placed server-side, so client information has to be passed to ad server (client IP, session id, user- agent);
  37. 37. Improvements Introduced - FrontendFrontend delivery algorithm was changed to support: New segmentation rules system; Changed election algorithm; Zone capping; Server-side storage of information instead of cookies; Increased performance; New endpoints to provide new types of ads; No write operations into database; Gather user properties for analysis;
  38. 38. Improvements Introduced - FrontendSome eye opening numbers: More than web requests per month; 9 frontend servers using 36Mbits outbound and 25Mbits inbound, in a total of 61Mbits throughput! Aproximately 2,200 ad requests per second and the twice of web requests (4,400/s); 95% of the web requests replied under 18ms; PHP power at work... :-)
  39. 39. Improvements Introduced - BackendStatistics component was changed to read information from a database replica due large number of accesses;Backoffice changed to support some filters and results paging;All user generated delete operations were removed, why? Removal of a user, due to table relations could delete all campaigns and statistics, and compromise forecast results; Deleting of a campaign, could loose all campaign data, required for billing; So all delete operations are done in maintenance tasks;
  40. 40. Improvements Introduced - BackendWe also added new targetting rules and improved others: Geographical: country, district; Mobile Devices Model, OS, Version; Browser Family; Internet Service Provider; Organization; Day of week;
  41. 41. Improvements Introduced - BackendMPE was changed for a couple of reasons: Become faster; Decrease memory usage; Changes in algorithm; Optimizations;MPE was reading ALL campaigns from database even finished ones, so memory comsuption was increasing linearly;All services are now redundant;
  42. 42. Improvements Introduced - Backend
  43. 43. Overall Architecture
  44. 44. Report ServerOpenX only generates csv reports;A more reliable product required more reliable, comercial- style reports;This need lead us to try out JasperReports, an open-source Java reports generator;Thanks to iReport for Jasper, a Crystal-Reports style report designer as a tool for creating reports, the reports can be easily edited and tested;
  45. 45. Report Server iReports for Jasper
  46. 46. Report ServerSo, starting with JasperReports we generated a cloud style report generation farm, how?Combining it with SAPO Broker, a message passing system and a flexible layered architecture;Given this, a report request is a simple message delivered to a SAPO Broker queue;Every server generating reports can consume a report request, allowing this architecture to scale almost linearly;
  47. 47. Report ServerWe developed this report server in a layered style: What report to generate; Report parameters; Datasource to use; Outputs formats (HTML, XLS, Word, PDF,...); Delivery channels (Email, FTP, SSH, …); Report completion notification (HTTP, DB);This layered style architecture allows us to extend any of the layers with new features;Will become available has open-source soon...
  48. 48. Report Server Layer 1: what to generate Report & parameters Layer 2: data source Data to use on report Layer 3: output formats Xls, pdf, doc, rtf... Layer 4: delivery channels Http, db, email Layer 5: completion notification Url, db
  49. 49. Problems FoundUnable to scale; Some queries would read an entire database table if existed long-running campaigns; Changed this and acumulated totals in each banner what is easier to sum; Some internal data is still passed on using temporary tables, but not for long...Not fast enough, of course OpenX is good enought for small site advertising, but not for an entire ad network;Some entities were not working properly or were missing due to business requirements;
  50. 50. Problems FoundBut in retrospective OpenX gave us a good starting point...Tweaking open-source code allowed us to: From an existing open-source solution obtain a good base to develop a better solution; Save some costs if we had started for scratch; Gain knowledge about advertisement concepts; Customize new features according to specific needs;So tweaking open-source is a great idea has a base to create good solutions!!!
  51. 51. Q&A Thank You
  52. 52. Links