Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
417
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Abstract: In 2002, Yahoo selected PHP for Web site development and began to phase out its own proprietary server-side scripting language. Three years later, Michael Radwin reflects on how the switch to PHP offered both technical challenges and productivity increases. The first part of the presentation offers a look inside Yahoo's decision-making process to adopt an open-source scripting language. Radwin addresses why Yahoo selected PHP over other languages, focusing on the performance and stability required to serve billions of page views a day. In the second part, Radwin discusses Yahoo's PHP development methodology, which has enabled its engineers to rapidly implement features while still creating software that is maintainable over long periods of time. Biography: Michael J. Radwin is an engineering manager for Yahoo's Infrastructure Software group. His team develops and supports Web platform technologies such as Apache, PHP, and MySQL, and more recently SOAP/REST toolkits. Radwin has been hacking on Apache since 1998 in high-performance environments and his team has been instrumental in helping Yahoo migrate from proprietary to open source software.
  • Numbers from Q3 2005 Yahoo! Earnings October 18, 2005
  • Compared PHP 4.1.2, mod_perl, yScript (Yahoo proprietary) Pentium III 800Mhz, 512M RAM, FreeBSD 4.3 (average for early 2002) Sample app: 33K input script, 41K output Included and evaluated 3 other files Header, navbar, footer Arithmetic, regex, echo variables Pseudo-personalization (“Hello, mradwin”) A few calls to C++ extension Fetch user profile from profile server Insert advertisements from adserver
  • Yahoo property (sports, finance, personals, etc…) Load balancer - which server can most handle requests coming in based on algorithm (round robin, least connections, etc..) Running on server are bunch of PHP scripts. Can make remote calls to relational databases, or to other web services.
  • Web pages go regular Apache htdocs dir http://login.yahoo.com/config/login?.intl=dk /usr/local/share/htdocs/dk/login.php Business logic goes in PEAR directory /usr/local/share/pear/HTML/Form.php /usr/local/share/pear/Yahoo/Sports/Teams.php
  • Profile with APD to see where your hot spots are. If you see a function being called 8,000 times on one page, that might be a good candidate to port to C Focus on scripts (or include files) that get hit a lot Don’t bother optimizing a script that only gets called once in a while Examples of candidates for extensions Distributed locking i18n Advertisements UDB (user database) Cookies DBM-like flat files Security Input Filtering

Transcript

  • 1. PHP at Yahoo! http://public.yahoo.com/~radwin/ Michael J. Radwin October 20, 2005
  • 2. Outline
    • Yahoo!, as seen by an engineer
    • Choosing PHP in 2002
    • PHP architecture at Yahoo!
  • 3. The Internet’s most trafficked site
  • 4. 25 countries, 13 languages
  • 5. Yahoo! by the Numbers
    • 411M unique visitors per month
    • 191M active registered users
    • 11.4M fee-paying customers
    • 3.4B average daily pageviews
    • October 2005
  • 6.  
  • 7. Engineering Values
    • Security & Privacy
      • We must protect our customers’ information
    • High Availability
      • If the site is offline, we’re missing the opportunity to serve our customers
    • Performance
      • We serve billions of pageviews a day
    • Flexibility & Innovation
      • Customize site for each market
      • Rapid development of new features
  • 8. From Proprietary to Open Source 94 95 96 97 98 99 00 01 02 03 04 05 Web Server Apache “ Filo Server” Web Lang yScript DB Flat Files
  • 9. Choosing a Language How and Why We Selected PHP
  • 10. Choosing PHP: brief history
    • October 2001: 3 proprietary languages
      • Costly to continue to maintain each
      • Limited features (no subroutines!)
    • Committee began researching
      • Compare features, performance
      • Build vs. Buy vs. Open Source
    • PHP selected May 2002
  • 11. Ideal Language Criteria
    • High performance
    • Robust, sand-boxed
    • Language features
      • Loops, conditionals
      • Complex data-types
    • C/C++ extensions
    • Runs on FreeBSD
    • Interpreted or dynamically compiled
    • i18n support
    • Clean separation of presentation/content/app semantics
    • Low training costs
    • Doesn’t require CS degree to use
  • 12. Top 10 Language Choices XSLT yScript mod_include
  • 13. Performance: Requests mod_perl yScript
  • 14. Performance: Memory mod_perl yScript
  • 15. Why we picked PHP
    • Designed for web scripting
    • High performance
    • Large, Open Source community
      • Documentation, easy to hire developers
    • “ Code-in-HTML” paradigm
      • <html>
      • <?php echo &quot;Hello World&quot; ; ?>
      • </html>
    • Integration, libraries, extensibility
    • Tools: IDE, debugger, profiler
  • 16. PHP at Yahoo! Today
  • 17. Yahoo!’s Development Methodology
    • Server Architecture
    • File Layout
    • Dependency Management
    • Security
    • Performance
    • Globalization
  • 18. Server Architecture User Profile Server web server web server Web Server Scripts Load Balancer Ad Server Web Services Apache
  • 19. File Layout
    • HTML Templates
      • /usr/local/share/htdocs/*.php
    • Template Helpers
      • /usr/local/share/htdocs/*.inc
    • Business Logic
      • /usr/local/share/pear/*.inc
    • C/C++ Core Code
      • Data access, Networking, Crypto
    50% HTML 50% PHP 0% HTML 100% PHP 0% HTML 0% PHP 95% HTML 5% PHP
  • 20. Dependency Management
    • Base PHP package depends only on XML parser
      • ./configure --disable-all
    • Self-Contained Extensions
      • mysql, dba, curl, ldap, pcre, gd, iconv
      • To enable
        • Install /usr/local/lib/php/20020429/mysql.so
        • Add “ extension = mysql.so ” to php.ini
      • Avoids unnecessary dependencies
      • Smaller Apache memory footprint
  • 21. Security: INI Settings
    • open_basedir
      • Insurance against /etc/passwd exploits
    • allow_url_fopen = Off
      • Use libcurl extension instead
      • Avoid open proxy exploits
    • display_errors = Off
      • However, log_errors = On
    • safe_mode = Off
      • Intended for shared hosting environment
  • 22. Security: Input Filtering
    • http://search.yahoo.com/search?p=<script+src=http://evil.com/x.js>
    • Cross Site Scripting (XSS) most common attack
      • Also “SQL Injection”
    • Normal approach
      • strip_tags()
      • mysqli_escape_string()
      • Examine every line code
      • Tedious and error-prone
    • Use input_filter hook
      • Sanitize all user-submitted data
      • GET/POST/Cookie
  • 23. Performance: Opcode Caches
    • Easiest performance boost
      • Cache parsed .php scripts in shared memory
      • Optimizations
      • No code modifications!
    • Several products available
      • Zend Performance Suite
      • APC
      • Turck MMCache
  • 24. Performance: PHP Extensions in C++
    • PHP ships with 80 extensions written in C/C++
    • Yahoo! develops its own proprietary extensions
      • Fast execution speed
      • Access to client libraries
    • Longer development cycle
      • Edit, compile, link, debug
      • Manual memory-management
  • 25. Globalization: PHP Unicode
    • Native Unicode support in 2006
    • Collaborative effort
      • Andrei Zmievski (Yahoo!)
      • Andi Gutmans (Zend)
      • Many members of PHP Community
    + + = 6 ICU
  • 26.