Tweaking Open Source

1,306 views
1,236 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,306
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Tweaking Open Source

  1. 1. Tweaking Open-Source a case-study Nelson Gomes (nelson.gomes@telecom.pt) Team Leader 11th of November 2011
  2. 2. Talk IndexIntroductionOpenX ComponentsImprovements IntroducedOverall ArchitectureReport ServerProblems FoundLinksQ&A
  3. 3. Introduction – Online AdvertisementOnline advertisement coupes with delivering ads;Placing ads in sites is a complex process: Obtain all ads electable for a placeholder; Exclude ads with business limitations like capping; Assure that the ads are beying presented to the target audience; Assure the advertiser goals are being met; Account the ads delivered; Deliver the right ad format;
  4. 4. Introduction – Online AdvertisementExamples
  5. 5. Introduction – Online AdvertisementExamples
  6. 6. Introduction – Online AdvertisementExamples
  7. 7. Introduction – Online AdvertisementExamples
  8. 8. Introduction – OpenXOpen Source advertising server;Licensed under GNU General Public License;Project forked from phpAds developed by Tobias Ratschiller in 1998;Was called phpAdsNew, OpenAds and finally OpenX;Features: Has a web based GUI; Extendable plugins architecture; Serves ads throught JS and Iframes calls mainly;
  9. 9. Introduction – OpenXSupport technologies: PHP; MySQL; Web Server (Apache, Nginx); Optional memcached usage; Filesystem to serve ad content;
  10. 10. Introduction – OpenDisplayStarting from OpenX 1.8.5 version, SAPO OpenDisplay project began;A four-person team started in April 2010 to analyse and improve OpenX capabilities to ensure entire SAPOs ad serving network;In August 2010, OpenDisplay started to serve a major website, while development was undergoing;In February 2011, SAPO began migrating its ad serving network in a process that took about 3 months to complete;Today OpenDisplay serves the entire SAPOs ad network;
  11. 11. Introduction – OpenDisplayOpenDisplay serves ads for several media: Internet; Mobile Internet; Mobile Applications; TV set-top boxes; Connected TVs;In the near future well be serving bulk campaigns for other media;Ill try to tell in this presentation this endeavour steps and quirks;
  12. 12. OpenDisplay Components – FrontendThis component is responsible to serve all ad formats;No data processing is done here due to performance besides adserving itself;The adserving is done using munged PHP scripts for performance;Plugins are included in a on demand basis;Database queries are cached;So its all about ad serving decision making;
  13. 13. OpenDisplay Components – BackendComprises data feature processing;Web based GUI for campaign and ad management;Ad serving statistics;Reporting;Batch processing of ad delivery data for use by the frontends;
  14. 14. OpenDisplay Components – TasksMaintenance Priority Engine (MPE) Determines witch campaigns to serve given their priorities; Calculates ad serving probabilites given its probabilities and corrects them when underperforming or overperforming;Maintenance Statistics Engine (MSE) Processes ad serving numbers; Starts and stops campaigns;
  15. 15. Improvements Introduced - GeneralAdded reusable segmentation rules; This way a rule can be reused in several campaigns; Added compound segmentation rules; Segmentation rules engine was rewritten, cause the previous segmentation system was inadequate;Added the concept of Orders; Sometimes a customer has several goals to different sites; The concept of order allows to place several campaigns with different goals in a single customer order;
  16. 16. Improvements Introduced - GeneralAdded Zone Groups; Instead of selecting placeholders one a at a time we can associate several at once; Imagine that a Run of Network (RON) campaign for all MREC (300x250) placeholders would need to be associated to all placeholders one by one;Added revenue-share acounting; For ads served on pages with third-party content; This way, revenue can be shared with third-party content providers;
  17. 17. Improvements Introduced - GeneralOpenDisplay went through a security audit by SAPOs security team and several issues were solved;Backoffice: UI session cookies are now only delivered over SSL; Session id generation function wasnt good enought and could be easily guessed. This correction minimized session hijacking; New user profiles were added, and entity access was reviewed; Some user profiles were changed to read-only, like advertisers and sites;
  18. 18. Improvements Introduced - GeneralAds uploaded into the ad server are stored in a folder and served upon; At first look there is no problem with this, but over time in some systems this can cause inode exhaustion; So to prevent this, and speed up file retrieval we improved upload component to distribute the files in a two-level folder hierarchy;OpenX can use a content farm to deliver ads, so we use this feature from the start;
  19. 19. Improvements Introduced - GeneralTraffic forecast: OpenX doesnt have a traffic forecast engine, instead it uses an average of ads served; We developed two alternative forecast algorithms using Python; This forecast is critical for a couple of reasons: Inventory selling; Correct impression allocation for campaigns, specially due to targetting rules;
  20. 20. Improvements Introduced - GeneralTraffic forecast example:
  21. 21. Improvements Introduced - GeneralAdded data logging and analysis: We started to summarize delivery properties to allow us to calculate precise segmentation delivery probabilies; Using these numbers in combination with traffic forecast we can estimate the inventory for each campaign and its overall probability of delivery; Also, this information is useful to commercial purposes: Knowing the market is a very valuable information; We are currently migrating some of this data to Hbase that reduces data, making it usable;
  22. 22. Improvements Introduced - GeneralRestructured VAST 1.0 system and upgraded it to 2.0; Video Ad Serving Template (VAST) standard from Interactive Advertising Bureau; Delivers video ads (pre, mid and postrolls); Delivers overlays;We also added a new type of ad that allows us to serve SAPO text ads has images; This virtual ad type works has a proxy to a different ad system, combining two different ad systems; Probably the first time an ad system combined them;
  23. 23. Improvements Introduced - General<?xml version="1.0" encoding="UTF­8" standalone="no"?><VAST version="2.0" (...)> <Ad id="30324">  <InLine>   <AdSystem>OpenDisplay</AdSystem>   <AdTitle><![CDATA[Teste Vast Video]]></AdTitle>   <Description><![CDATA[VAST Ad]]></Description>   <Impression id="OpenDisplay"><![CDATA[http://pub.sapo.pt/lg.php?(...)]]></Impression>   <Creatives>    <Creative id="30324">     <Linear>      <TrackingEvents>       <Tracking event="creativeView"><![CDATA[http://pub.sapo.pt/(...)&vast_event=creativeview]]></Tracking>(...)      </TrackingEvents>      <MediaFiles><MediaFile (...) type="video/x­flv">http://(...)/video.flv</MediaFile></MediaFiles>     </Linear>    </Creative>   </Creatives>  </InLine> </Ad></VAST>
  24. 24. Improvements Introduced - GeneralFlash ads are a major problem in some systems that dont support Flash; iPhones and iPads for example;To assure these ads are at all times visible we added automatic Flash ad image generation to ads upload via Backend;This way, even if a Flash ad doesnt have a fallback image, we generate one automatically; This was accomplished using GNUs gnash in combination with xvfb-run that provides a virtual X Window System for gnash to run;
  25. 25. Improvements Introduced - GeneralFuture developments will include bulk campaigns; These campaigns differ from regular campaigns cause we know the characteristics of the audience in advance; Splitting audiance in sets with the same features we can process an entire set within the LP solver at once minimizing the number of variables;So we can optimize the revenue using linear programming solutions; We will use GLPK (GNU Linear Programming Kit) has a solver to obtain an optimal solution; This way we can provide a solution that maximizes a campaigns revenue;
  26. 26. Improvements Introduced - GeneralGLPK sample problem:# Giapettos problem, maximizing Giapettos profitvar x1 >=0;  /* soldier worths 3€  */var x2 >=0;  /* train worths 2€  *//* Objective function */maximize z: 3*x1 + 2*x2; // maximize Giapettos profit/* Constraints */s.t. Finishing : 2*x1 + x2 <= 100; // only 100 hours per weeks.t. Carpentry : x1 + x2 <= 80; // only 80 hours per weeks.t. Demand    : x1 <= 40; // demand of soldiers per weekEnd;
  27. 27. Improvements Introduced - FrontendDatabase write operations were removed. Database access now is read-only;Delivery scripts were analysed using xdebug, and major performance issues were tuned: User agent regexps used by PHPSniff were taking 25% of the entire request time. Using memchache as user agent cache we saved 97% of this time! All ad serving counters are done in memcache and persisted at every minute, soon well migrate this to broker queues; Improved ad caching system, to store and retrieve EVERYTHING in a single operation;
  28. 28. Improvements Introduced - FrontendUsing xdebug output has an input to KCachegrind it is very easy to analyse any PHP script: just run it!Files generated by xdebug are read and analysed by KCachegrind that shows for instance: How many times a function has been called; Total time each function used; Where request time is use;Making very easy to detect and improve any long running script;
  29. 29. Improvements Introduced - FrontendKCachegrind printscreen
  30. 30. Improvements Introduced - FrontendInstead of using an Apache web server we decided to use Nginx with PHP-FPM: Nginx scales almost linearly; PHP-FPM behaved very fast in our tests;PHP-FPM is a FastCGI implementation, now blunded with PHP 5.3.3;Instead of using PHP output compression, we used Nginx compression, witch is faster;Of course, we used a PHP accelerator: eAccelerator with shared memory witch is adequate to PHP-FPM multi- process architecture;
  31. 31. Improvements Introduced - FrontendEven adding new features, we still were able to reduce delivery times:
  32. 32. Improvements Introduced - FrontendIntroduced a cookie abstraction API to allow storing all cookie and session information server-side: OpenX by default stored session information in cookies what was insufficient to keep an entire ad network running due to cookie size limit (~4k); This was a critical issue for long serving campaigns that used capping or conversion data; Less cookies means less bandwidth usage and faster responses;
  33. 33. Improvements Introduced - FrontendThe new session storage mechanism added new issues; The requests had to be sequential to allow correct session retrieval and storage; This required a lock mechanism to obtain session info in an ordered fashion; This was accomplished using memcache atomic increments to lock session access; All sessions are stored in memcache and the complete process of locking, retrieving, storing and unlocking of the session is done in a few ms (<3ms), from remote servers!;
  34. 34. Improvements Introduced - FrontendWe can see in this chart outbound traffic dropped significantly:
  35. 35. Improvements Introduced - FrontendWe introduced zone capping, a feature that wasnt available in OpenX; This feature is very useful with video ads, to avoid user flooding with video ads; Using zone capping we can say that a user will see one or more ads and then will not see any more ads during a given period of time; This feature is managed by placeholder, independently of the campaign settings;
  36. 36. Improvements Introduced - FrontendAdded new delivery endpoints to accomodate new formats: Mobile: Json Xml iPhonePlist TV VASTAlso we developed a SDK to help mobile ads integration: Mobile ads are placed server-side, so client information has to be passed to ad server (client IP, session id, user- agent);
  37. 37. Improvements Introduced - FrontendFrontend delivery algorithm was changed to support: New segmentation rules system; Changed election algorithm; Zone capping; Server-side storage of information instead of cookies; Increased performance; New endpoints to provide new types of ads; No write operations into database; Gather user properties for analysis;
  38. 38. Improvements Introduced - FrontendSome eye opening numbers: More than 4.000.000.000 web requests per month; 9 frontend servers using 36Mbits outbound and 25Mbits inbound, in a total of 61Mbits throughput! Aproximately 2,200 ad requests per second and the twice of web requests (4,400/s); 95% of the web requests replied under 18ms; PHP power at work... :-)
  39. 39. Improvements Introduced - BackendStatistics component was changed to read information from a database replica due large number of accesses;Backoffice changed to support some filters and results paging;All user generated delete operations were removed, why? Removal of a user, due to table relations could delete all campaigns and statistics, and compromise forecast results; Deleting of a campaign, could loose all campaign data, required for billing; So all delete operations are done in maintenance tasks;
  40. 40. Improvements Introduced - BackendWe also added new targetting rules and improved others: Geographical: country, district; Mobile Devices Model, OS, Version; Browser Family; Internet Service Provider; Organization; Day of week;
  41. 41. Improvements Introduced - BackendMPE was changed for a couple of reasons: Become faster; Decrease memory usage; Changes in algorithm; Optimizations;MPE was reading ALL campaigns from database even finished ones, so memory comsuption was increasing linearly;All services are now redundant;
  42. 42. Improvements Introduced - Backend
  43. 43. Overall Architecture
  44. 44. Report ServerOpenX only generates csv reports;A more reliable product required more reliable, comercial- style reports;This need lead us to try out JasperReports, an open-source Java reports generator;Thanks to iReport for Jasper, a Crystal-Reports style report designer as a tool for creating reports, the reports can be easily edited and tested;
  45. 45. Report Server iReports for Jasper
  46. 46. Report ServerSo, starting with JasperReports we generated a cloud style report generation farm, how?Combining it with SAPO Broker, a message passing system and a flexible layered architecture;Given this, a report request is a simple message delivered to a SAPO Broker queue;Every server generating reports can consume a report request, allowing this architecture to scale almost linearly;
  47. 47. Report ServerWe developed this report server in a layered style: What report to generate; Report parameters; Datasource to use; Outputs formats (HTML, XLS, Word, PDF,...); Delivery channels (Email, FTP, SSH, …); Report completion notification (HTTP, DB);This layered style architecture allows us to extend any of the layers with new features;Will become available has open-source soon...
  48. 48. Report Server Layer 1: what to generate Report & parameters Layer 2: data source Data to use on report Layer 3: output formats Xls, pdf, doc, rtf... Layer 4: delivery channels Http, db, email Layer 5: completion notification Url, db
  49. 49. Problems FoundUnable to scale; Some queries would read an entire database table if existed long-running campaigns; Changed this and acumulated totals in each banner what is easier to sum; Some internal data is still passed on using temporary tables, but not for long...Not fast enough, of course OpenX is good enought for small site advertising, but not for an entire ad network;Some entities were not working properly or were missing due to business requirements;
  50. 50. Problems FoundBut in retrospective OpenX gave us a good starting point...Tweaking open-source code allowed us to: From an existing open-source solution obtain a good base to develop a better solution; Save some costs if we had started for scratch; Gain knowledge about advertisement concepts; Customize new features according to specific needs;So tweaking open-source is a great idea has a base to create good solutions!!!
  51. 51. Q&A Thank You
  52. 52. Linkshttp://www.openx.comhttp://php-fpm.orghttp://jasperforge.org/projects/jasperreportshttp://jasperforge.org/project/ireporthttp://softwarelivre.sapo.pt/brokerhttp://www.php.nethttp://nginx.nethttp://www.gnu.org/s/gnashhttp://www.gnu.org/s/glpk

×